8

I was reading a class note on SVM from Andrew Ng (pp 19~20 from http://cs229.stanford.edu/notes/cs229-notes3.pdf) and can't understand something in the lecture note. It says that the L1-regularzed soft-margin problem is the following:

$$\begin{array}{ll} \mbox{min.} & & \frac{1}{2} ||w||^2 + C \sum \xi_i \\ \mbox{s.t.} & & y_i (w^T x_i+b) \geq 1 - \xi_i, i = 1,...,m\\ & & \xi_i \geq 0, i=1,...,m \end{array}$$

It then says that the Lagrangian is

$L(w,b,\xi, \alpha, r) = \frac{1}{2}w^Tw + C\sum{\xi_i} - \sum \alpha_i [y_i (x^Tw+b)-1+\xi_i] - \sum r_i \xi_i$

Then, it says that the KKT dual-complementarity condition are: $$\begin{array}{ll} \alpha_i = 0 & \Rightarrow & y_i (w^Tx_i + b) \geq 1 \\ \alpha_i = C & \Rightarrow & y_i (w^Tx_i + b) \leq 1 \\ 0 < \alpha_i < C & \Rightarrow & y_i (w^Tx_i + b) = 1 \end{array}$$

I don't understand the last part. Isn't the KKT dual-complementarity condition $\alpha_i [y_i (x^Tw+b)-1+\xi_i]$ = 0 -- that is, either $\alpha_i = 0$ or $ y_i (x^Tw+b)-1+\xi_i = 0$. Can anybody explain to me how the above complementarity conditions are derived? I am troubled that $\xi_i$ doesn't appear in the complementary condition.

DSKim
  • 1,289

2 Answers2

8

Unfortunately, I am late a couple of years, but after reading Ng's lecture notes, I was asking myself the same question. The complementarity conditions you have listed follow from the other KKT conditions, namely: $$ \begin{align} \alpha_i &\geq 0 \tag{1},\\ g_i(w) &\leq 0 \tag{2},\\ \alpha_i g_i(w) &= 0 \tag{3},\\ r_i &\geq 0 \tag{4},\\ \xi_i &\geq 0 \tag{5},\\ r_i \xi_i &= 0 \tag{6}, \end{align} $$ where $$ g_i(w) = - y^{(i)} \left(w^T x^{(i}) + b\right) +1 -\xi_i. $$ Furthermore, from $$ \begin{equation} \frac{\partial \mathcal{L}}{\partial \xi_i} \overset{!}{=} 0, \end{equation} $$ we obtain the relation $$ \alpha_i = C - r_i \tag{7}. $$

Now we can distinguish the following cases:

  • $\alpha_i = 0 \implies r_i = C \implies \xi_i = 0$ (from Eq. (7) and (6)), which together with Eq. (2) gives $$ \begin{equation} y^{(i)} \left(w^T x^{(i}) + b\right) \geq 1 \tag{8} \end{equation} $$
  • $0 < \alpha_i < C \implies r_i > 0 \implies \xi_i = 0$ via Eq. (3) gives $$ \begin{equation} y^{(i)} \left(w^T x^{(i}) + b\right) = 1 \tag{9} \end{equation} $$
  • And finally $\alpha_i = C \implies r_i = 0 \implies \xi_i \geq 0$ so that, again, from Eq. (2): $$ \begin{equation} \xi_i \geq 1 - y^{(i)} \left(w^T x^{(i}) + b\right), \end{equation} $$ which we note can only be fulfilled simultaneously with Eq. (5) if $$ \begin{equation} y^{(i)} \left(w^T x^{(i}) + b\right) \leq 1, \tag{10} \end{equation} $$

Note that Eq. (8) does not contribute to the SVM, as it is classified with sufficient confidence ($\alpha_i = 0$ as for the linearly separable case). For the case $0<\alpha_i<C$, $\xi_i=0$ the points are on the margin, and for $\alpha_i = C$ the points are within the margin (where depending on the value of $\xi_i$ the points are either classified correctly or incorrectly).

mSSM
  • 181
2

Considering a general primal problem: $$\min_w f(w)$$ subject to $$g_i(w)\leq 0 \forall i, h_j(w)=0 \forall j$$ This gives us the Lagrangian: $$ \mathcal{L}(w,\alpha,\beta) = f(w) +\sum_i\alpha_i g_i(w) + \sum_j\beta_j h_j(w) $$

KKT condition says that $\alpha_i^*g_i(w^*) = 0 \forall i$

Now let us state the soft margin SVM: $$\min_w \frac 12 w^Tw + C\sum_{i=1}^m \xi_i$$ subject to $$y_i(w^Tx_i+b)\geq 1-\xi_i \forall i \in \{1,2,..,m\}$$ $$\xi_i\geq 0 \forall i \in \{1,2,..,m\}$$

It is importatnt to notice that both of these constraints are inequalities. This means both of these will appear in the KKT conditions. This solves your trouble:

I am troubled that $\xi_i$ doesn't appear in the complementary condition.

(It does).

The Lagrangian can then be written as: $$\mathcal{L}(w,b,\xi,\alpha,r) = \frac{1}{2}w^Tw + C\sum_{i=1}^m{\xi_i} - \sum_{i=1}^m \alpha_i [y_i (w^Tx_i+b)-1+\xi_i] - \sum_{i=1}^m r_i \xi_i$$

We have by setting derivative wrt $\xi_i$ to 0: $$\alpha_i+r_i=C \forall i \in \{1,2,..,m\}$$

Applying the KKT condition at optimal point: $$\alpha_i[y_i(w^Tx_i+b)-1+\xi_i] = 0 \forall i \in \{1,2,..,m\}$$ $$r_i\xi_i = 0 \forall i \in \{1,2,..,m\}$$ Case 1: $$\alpha_i=0\implies r_i=C\implies \xi_i = 0 \implies y_i(w^Tx_i+b)-1+0\geq 0 $$ $$ \implies y_i(w^Tx_i+b)\geq1$$ Case 2: $$0<\alpha_i<C\implies 0<r_i<C\implies \xi_i = 0 \implies y_i(w^Tx_i+b)-1+0=0 $$ $$ \implies y_i(w^Tx_i+b)=1$$ Case 3: $$\alpha_i=C\implies r_i=0\implies \xi_i \geq 0 \implies y_i(w^Tx_i+b)-1+\xi_i=0, \xi_i\geq 0$$ $$\implies y_i(w^Tx_i+b)\leq1$$