Vladimir Vapnik wrote:
“When solving a problem of interest, do not solve a more general problem as an intermediate step. Try to get the answer that you really need but not a more general one.”
Vapnik wrote this in the context of transductive learning, but the principle could also be applied to statistical pattern recognition tasks. If we just need to perform a hard classification, then we should aim to determine the decision boundary directly, rather than estimating the posterior probability of class membership and then threshold that at 0.5 (or some other threshold depending on the misclassification costs). In principle, we may be able to get better generalisation in terms of accuracy by following this principle as it focusses on the needs of the application, and avoids modelling compromises that benefit modelling of aspects of the data that are not relevant to the application at the expense of other aspects that are.
The performance of the SVM on real-world problems suggest that there may be some merit in this, but it is not easy to find a clear example where this is demonstrably the case. Can anyone suggest an example where a SVM will give better performance than an equivalent logistic regression model (i.e. kernels and regularisation should be available for the logistic regression model, so we are just comparing the loss used, rather than the other aspects of the models)? Preferably the example should demonstrate why this is the case.
Obviously the no-free lunch theorem suggests that no classifier system is going to be superior for all datasets, and the theoretical justification for the SVM is largely a worst-case analysis, so the SVM may be better in pathologically difficult problems, but not necessarily the average case.
Personally I prefer kernel logistic regression over the support vector machine, largely because the hyper-parameter tuning is much easier, and I think they are likely to give broadly similar results. However, I have an open mind where the needs of the application mean that the limitations of the SVM (e.g. having to know misclassification costs a-priori) are not a problem.
Please can we avoid discussions of whether accuracy is a good performance metric. It has issues, but there are some applications where accuracy is the relevant statistic of interest, and there have already been good discussions of this topic.
Note the answer here is the sort of thing I am looking for, but unfortunately it isn't correct (see my comment on the answer).










