Questions tagged [svm]

Support Vector Machine refers to "a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis."

...The standard SVM takes a set of input data and predicts, for each given input, which of two possible classes the input is a member of, which makes the SVM a non-probabilistic binary linear classifier. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

--Wikipedia

Visually:

enter image description here

2288 questions
75
votes
4 answers

Why bother with the dual problem when fitting SVM?

Given the data points $x_1, \ldots, x_n \in \mathbb{R}^d$ and labels $y_1, \ldots, y_n \in \left \{-1, 1 \right\}$, the hard margin SVM primal problem is $$ \text{minimize}_{w, w_0} \quad \frac{1}{2} w^T w $$ $$ \text{s.t.} \quad \forall i: y_i…
blubb
  • 2,630
12
votes
1 answer

Should an SVM grid search show a high-accuracy region with low accuracies around?

I have 12 positive training sets (cancer cells treated with drugs with each of 12 different mechanisms of action). For each of these positive training sets, I would like to train a support-vector machine to distinguish it from a negative set of…
12
votes
1 answer

How do support vector machines avoid overfitting?

I understand that in the dual form of the model for support vector machines, the feature vectors are expressed only as a dot product. Mapping the feature vectors to a higher dimensional space can accommodate classes that are not linearly separable…
gavinmh
  • 1,095
11
votes
1 answer

Given a set of points in two dimensional space, how can one design decision function for SVM?

Can someone explain me how one goes about designing a SVM decision function? Or point me to resource that discusses a concrete example. EDIT For the below example, I can see that the equation $X_2 = 1.5$ separates the classes with maximum margin.…
naresh
  • 213
11
votes
1 answer

What are the support vectors in a support vector machine?

I know how support vector machines work, but for some reason I always get confused by what exactly the support vectors are. In the case of linearly separable data, the support vectors are those data points that lie (exactly) on the borders of the…
10
votes
2 answers

Can a linear SVM only have 2 classes?

Can a linear SVM support more than 2 classes for classification?
john
  • 223
9
votes
2 answers

Gridsearch for SVM parameter estimation

I'm currently experimenting with gridsearch to train a support vector machine. I understand that, if I have parameter gamma and C, the R function tune.svm performs a 10-fold cross validation for all combinations of these 2 parameters. Since I did…
9
votes
1 answer

Why would scaling features decrease SVM performance?

I have used scaling on features of a model which contains 40 features (all columns are numbers) and a binary output variable. This is the Kaggle contest here I've scaled the features assuming it would deliver better performance, but with a rbf…
mahonya
  • 1,111
  • 2
  • 8
  • 17
8
votes
2 answers

How are the convergence conditions / KKT conditions for the soft-margin SVM derived

I was reading a class note on SVM from Andrew Ng (pp 19~20 from http://cs229.stanford.edu/notes/cs229-notes3.pdf) and can't understand something in the lecture note. It says that the L1-regularzed soft-margin problem is the…
DSKim
  • 1,289
7
votes
1 answer

SVM why do we maximize 2/||w||

If you open any SVM guide you will see that 1/||w|| is proportional to margin size (which is meant to be maximized by SVM). But how did you get this result? On the picture below you may see 2 plots. One is $x_{1} = -5x_{2} + 5$ and second is $x_{1}…
7
votes
1 answer

libsvm training very slow on 100K rows, suggestions?

I'm trying to run the libsvm-provided wrapper script easy.py on a training set of 100K rows, each row has ~300 features. The feature data is relatively sparse, say only 1/10th are non-zero values. The script is excruciatingly slow, I'm talking days…
user6020
6
votes
2 answers

Confusion about Karush-Kuhn-Tucker conditions in SVM derivation

I am currently following CS229 and I'm trying to be diligent, proving most of the things that are not immediately obvious. So I'm looking at the derivation of the SVM as the dual problem to the optimal margin classifier. Let $x^{i}, y^{i}$ be…
Addy
  • 201
  • 2
  • 7
6
votes
1 answer

Why SVM struggles to find good features among garbage?

I'm work on a small data set with a many features where most of them are just garbage. The goal is to have a good classification accuracy on this binary classification task. So, I made up a small example code illustrating the problem. The code…
lcit
  • 206
6
votes
4 answers

svm functional margin and geometric margin

I know that the functional margin has the following formula and I have read that given a training set we define the function margin of (w.b) with respect to S to be the minimum of this functional margins: but why it says that one should find the…
Layla
  • 621
5
votes
1 answer

Understanding the geometric margin of SVM

I was watching andrew ng's lecture on machine learning and I came across 'geometric margin' in the SVM lecture. I am confused about he obtained the equation for the point B ? Notice that the hyperplane is the slanted line where $w^Tx + b = 0$ The…
mynameisJEFF
  • 1,843
1
2 3
8 9