SVM Math Question

Question

I'm studying support vector machines and came across this paper. The following equation doesn't make sense to me, especially the part with the 0 ∀i. Any help understanding the basics of SVMs?

yi * (xi * w + b) - 1 >= 0 ∀i

xi - one input
yi - the classification (+1 or -1)
w and b - variables to find in order to "train up" the SVM (?)

Please forgive my ignorance on SVMs, a lot is still unclear to me and I'm trying to learn these bit by bit. Thanks!

see (http://scicomp.stackexchange.com/questions/2095/calculating-lagrange-coefficients-for-svm-in-python), but correct that you minimize $w^2+b^2$. In another post, (http://scicomp.stackexchange.com/questions/10420/support-vector-machines-as-neural-nets), the following link (http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html) was given by hardmath with a nice overview of what the parameters mean. — Lutz Lehmann, Dec 29 '13 at 11:33
minimizing $|w|$ is sufficient, the value of $b$ follows from having points on both sides of the hyperplane. — Lutz Lehmann, Dec 29 '13 at 18:12

Lutz Lehmann · Accepted Answer · 2013-12-30T07:08:49.833

You have two classes of points. Instead of managing them in two sets, one just assigns each point in the first class the value $-1$ and in the second class the value $+1$. So in fact you have point-value pairs $(x_i,y_i)$. To classify future points in a consistent way you now want to construct a function $f(x)$ that has not exactly $f(x_i)=y_i$ as in interpolation, but the still sufficient condition $f(x_i)\le -1$ for points in the first class and $f(x_i)\ge +1$ for points in the second class. These two kinds of inequality can be compressed into one single class of inequalities by multiplying with the sign $y_i$,

$$y_if(x_i)\ge 1$$

for all $i=1,...,N$, where $N$ is the number of training points.

"for all" has the symbolic sign $\forall$, an inverted letter "A". The inverted letter "E", $\exists$, is the symbol for "exists".

Now to find such a function, you select a parametrized class of functions $f(w,x)+b$ with some parameter vector $(w,b)$ and strive to find a compromise between having a simple form of $f$ and small function values on the test set, or rather, $f(w,x_i)=y_i$, which defines the support vectors, on as many points as possible. Simplicity includes that the parameters in $w$ are small numbers.

So we come to the linear SVM where $f(w,x)=w^Tx$ and minimal paramters means to minimize $\|w\|_2^2=w^Tw$.

In optimization, this task is encoded via a Lagrange function

$$L(w,b,α)=\tfrac12\|w\|_2^2-\sum_{i=1}^Nα_i(y_i(w^Tx_i+b)-1)$$

with the restriction $α_i\ge 0$.

Standard optimization techniques solve this problem via its KKT system. \begin{align} 0=\frac{\partial L}{\partial w}&=w-\sum_{i=1}^Nα_iy_ix_i\\ 0=\frac{\partial L}{\partial b}&=-\sum_{i=1}^Nα_i y_i\\ α_i&\ge 0\\ y_i(w^Tx_i+b)-1&\ge 0\\ α_i\,(y_i(w^Tx_i+b)-1)&=0 \end{align}

The last three equations again for all $i$. They can be combined using NCP functions like

$$N(u,v)=2uv-(u+v)_-^2$$

with $(u+v)_-=\min(0,u+v)$ to one condition per $i$

$$N(α_i,\, y_i(w^Tx_i+b)-1)=0.$$

This now is smooth enough so that Newton's method or quasi-Newton methods may be applied.

Awesome answer. You answered a lot of questions I had, but didn't even express in the question. Stellar, thanks so much! — Josh F, Dec 30 '13 at 01:03

SVM Math Question

1 Answers1