Kernel methods are very effective in many supervised classification tasks. So what are the limitations of kernel methods and when to use kernel methods? Especially in the large scale data era, what are the advances of kernel methods? What is the difference between kernel methods and multiple instance learning?
If the data is 500x10000, 500 is the count of samples, and 10000 is the dimension of each feature, then in this circumstance, can we use the kernel methods?
1 Answers
Kernel methods can be used for supervised and unsupervised problems. Well-known examples are the support vector machine and kernel spectral clustering, respectively.
Kernel methods provide a structured way to use a linear algorithm in a transformed feature space, for which the transformation is typically nonlinear (and to a higher dimensional space). The key advantage this so-called kernel trick brings is that nonlinear patterns can be found at a reasonable computational cost.
Note that I said the computational cost is reasonable, but not negligible. Kernel methods typically construct a kernel matrix $\mathbf{K} \in \mathbb{R}^{N\times N}$ with $N$ the number of training instances. The complexity of kernel methods is therefore a function of the number of training instances, rather than the number of input dimensions. Support vector machines, for example, have a training complexity between $O(N^2)$ and $O(N^3)$. For problems with very large $N$, this complexity is currently prohibitive.
This makes kernel methods very interesting from a computational perspective when the number of dimensions is large and the number of samples is relatively low (say, less than 1 million).
Related: Linear kernel and non-linear kernel for support vector machine?
SVM for Large Scale Problems
For very high dimensional problems, such as the 10000 dimensions you mention in the question, there is often no need to map to a higher dimensional feature space. The input space is already good enough. For such problems, linear methods are orders of magnitude faster with almost the same predictive performance. Examples of these methods can be found in LIBLINEAR or Vowpal Wabbit.
Linear methods are particularly interesting when you have many samples in a high dimensional input space. When you have only $500$ samples, using a nonlinear kernel method will also be cheap (since $N$ is small). If you had, say, $5.000.000$ samples in $10.000$ dimensions, kernel methods would be infeasible.
For low-dimensional problems with many training instances (so-called large $N$ small $p$ problems), linear methods may yield poor predictive accuracy. For such problems, ensemble methods such as EnsembleSVM provide nonlinear decision boundaries at significantly reduced computational cost compared to standard SVM.
- 18,401
RBFkernel inlibsvm, it always overfitting, the classifier achieves a high accuracy but low accuracy in the testing set. And If I do dimension reduction before the classifier, and the reduced dimensions is close to the number of training samples, the classifier maybe achieve a good profit between training and testing set. Does the results fit most empirical results? Thanks. – mining Oct 28 '13 at 13:06gammafor the RBF kernel. The optimal value forgammais related to the number of input dimensions. The most common tuning approach is cross-validation. If you used the same value forgammawith and without dimensionality reduction you are probably making a mistake. – Marc Claesen Oct 28 '13 at 13:10grid.pyinlibsvmpackage to do cross-validation. And in most circumstances, for high dimensions data, thegammaalways very small, such as0.00001, this level. – mining Oct 28 '13 at 13:14EnsembleSVM, does it need to make the cross-validation procedure multithreading? And I think in the predict stage, it will be good that predicting the huge data in batches and multithreading or multi machines? – mining Oct 28 '13 at 13:46esvm-trainandesvm-predict. To disable multithreading, use the following flag in those tools:-threads 1. – Marc Claesen Oct 28 '13 at 13:48libsvmpackage, it seems to no multithreading, but I think it's very important in large scale data processing. – mining Oct 28 '13 at 13:53libsvmdoes indeed not offer multithreading. Training a single SVM is not straightforward to parallelize. – Marc Claesen Oct 28 '13 at 13:57EnsembleSVM, I will give feedback to you. – mining Oct 28 '13 at 14:03