I often hear the following complaint from people: "SVMs work really well WHEN they actually work." By "work" I mean that the algorithm will actually finish running. Are SVMs difficult to fit in practice versus other models, or are these people making a (I assume common) mistake while attempting to use SVMs?
-
2The SVM fitting problem in the dual form is a convex quadratic programming problem. Optimization is not my area of expertise, but this suggests to me that the fitting procedure should always terminate correctly. I think the quotation probably means that "When the SVM's predictions add value to the problem that we want to solve, it adds considerable value." – Sycorax Jul 30 '15 at 17:49
2 Answers
In the theory, to insure the convergence, your problem must be convex. Otherwise you can't know if the algorithm will find a solution or not.
Moreover SVMs' complexity are O(n^2) so not very scalable with large datasets.
- 111
-
The optimisation problem for an SVM with a Mercer kernel will be convex. – Dikran Marsupial Jul 31 '15 at 16:01
I suspect the problem is to do with the hyper-parameter settings (the regularization parameter, $C$, and any kernel parameters). You can end up with essentially all of the data being support vectors, at which point the optimisation procedure takes a long time because there are many "active" parameters. For small to medium size problems (a few thousand training patterns) I tend to use least-squares support vector machines, as the fitting procedure only requires solving a single set of linear equations (or an eigen-decomposition of the kernel matrix if you want to tune the regularisation parameter as well).
Getting good generalisation performance from an SVM depends on choosing a suitable kernel and careful tuning of the hyper-parameters (grid search minimising cross-validation error is a good method).
- 54,432
- 9
- 139
- 204
-
Thank you for the response. How do you identify when it is appropriate to use an SVM over another method (random forest, neural net, boosted trees etc.)? I have always struggled with this. Generally, my review of the literature has lead to me to the following:
- Compare methods
- Random forests tend to be better for classification problems as opposed to regression problems
- Neural Nets can be applied to virtually any type of problem (training time can be an issue)
- Boosted trees can be applied to most problems and perform well if the base learner is weak
- Try ensembles (weak learners)
-
1Generally the best thing to do is to try several methods and see what works best using cross-validation. Most rules of thumb for selecting methods tend to be to uncertain to be of much use, and at the end of the day a lot depends on the skill of the operator in applying a particular method (especially neural nets). I find kernel machines pretty useful, mostly because they are easy to use and performance generally just depends on choosing the hyper-parameters carefully using cross-validation. I also find bagging to be a useful "belt-and-braces" approach to avoiding overfitting. – Dikran Marsupial Aug 03 '15 at 09:12
-
@Dikran Marsupial: what do you think about this ? http://stats.stackexchange.com/questions/168051/what-is-the-relationship-between-a-kernel-and-projection-to-a-higher-dimensional/168082#168082 – Aug 20 '15 at 18:12