Why are SVMs hard to fit?

Question

I often hear the following complaint from people: "SVMs work really well WHEN they actually work." By "work" I mean that the algorithm will actually finish running. Are SVMs difficult to fit in practice versus other models, or are these people making a (I assume common) mistake while attempting to use SVMs?

The SVM fitting problem in the dual form is a convex quadratic programming problem. Optimization is not my area of expertise, but this suggests to me that the fitting procedure should always terminate correctly. I think the quotation probably means that "When the SVM's predictions add value to the problem that we want to solve, it adds considerable value." — Sycorax, Jul 30 '15 at 17:49

score 1 · Answer 1 · answered Jul 31 '15 at 14:01

1

In the theory, to insure the convergence, your problem must be convex. Otherwise you can't know if the algorithm will find a solution or not.

Moreover SVMs' complexity are O(n^2) so not very scalable with large datasets.

answered Jul 31 '15 at 14:01

mva

111

The optimisation problem for an SVM with a Mercer kernel will be convex. – Dikran Marsupial Jul 31 '15 at 16:01

score 1 · Accepted Answer · answered Jul 31 '15 at 16:05

1

I suspect the problem is to do with the hyper-parameter settings (the regularization parameter, $C$, and any kernel parameters). You can end up with essentially all of the data being support vectors, at which point the optimisation procedure takes a long time because there are many "active" parameters. For small to medium size problems (a few thousand training patterns) I tend to use least-squares support vector machines, as the fitting procedure only requires solving a single set of linear equations (or an eigen-decomposition of the kernel matrix if you want to tune the regularisation parameter as well).

Getting good generalisation performance from an SVM depends on choosing a suitable kernel and careful tuning of the hyper-parameters (grid search minimising cross-validation error is a good method).

answered Jul 31 '15 at 16:05

Dikran Marsupial

54,432
9
139
204

Thank you for the response. How do you identify when it is appropriate to use an SVM over another method (random forest, neural net, boosted trees etc.)? I have always struggled with this. Generally, my review of the literature has lead to me to the following:
1. Compare methods
2. Random forests tend to be better for classification problems as opposed to regression problems
3. Neural Nets can be applied to virtually any type of problem (training time can be an issue)
4. Boosted trees can be applied to most problems and perform well if the base learner is weak
5. Try ensembles (weak learners)
– mmmmmmmmmm Aug 02 '15 at 01:27
1

Generally the best thing to do is to try several methods and see what works best using cross-validation. Most rules of thumb for selecting methods tend to be to uncertain to be of much use, and at the end of the day a lot depends on the skill of the operator in applying a particular method (especially neural nets). I find kernel machines pretty useful, mostly because they are easy to use and performance generally just depends on choosing the hyper-parameters carefully using cross-validation. I also find bagging to be a useful "belt-and-braces" approach to avoiding overfitting. – Dikran Marsupial Aug 03 '15 at 09:12
@Dikran Marsupial: what do you think about this ? http://stats.stackexchange.com/questions/168051/what-is-the-relationship-between-a-kernel-and-projection-to-a-higher-dimensional/168082#168082 – Aug 20 '15 at 18:12

Why are SVMs hard to fit?

2 Answers2