I am currently following CS229 and I'm trying to be diligent, proving most of the things that are not immediately obvious. So I'm looking at the derivation of the SVM as the dual problem to the optimal margin classifier.
Let $x^{i}, y^{i}$ be features and labels, $i = 1, \ldots, m$, the optimal margin classifier problem is then defined as:
$$ min_{w,b} \,\,\frac{1}{2} ||w||^2 \\ s.t. \,\,\, y^{(i)}(w^Tx^{(i)} + b) \ge 1\,\,\,,\,\, i = 1, \ldots, m $$
Now we construct the generalized Lagrangian (that allows for inequality constraints) as:
$$ \mathcal{L}(w,b,\alpha) = \frac{1}{2} ||w||^2 - \sum_{i=1}^m \alpha_i [y^{(i)}(w^Tx^{(i)} + b) - 1] $$
Using the KKT conditions we compute derrivatives w.r.t. $w$ and $b$, substitute them etc. into the formula above, and then construct this dual problem:
$$ max_\alpha \,\,\,\mathcal{L}(\alpha) =\sum_{i=1}^m \alpha_i - \frac{1}{2} \sum_{i=1}^m \sum_{j=1}^m y^{(i)} y^{(j)} \alpha_i \alpha_j (x^{(i)})^T x^{(j)} \\ s.t. \,\,\,\alpha_i \ge 0 \,\,, \,i=1,\ldots,m \\ \sum_{i=1}^m \alpha_i y^{(i)} = 0 $$
I can see how all of the KKT conditions are satisfied in the above problem, except one. And that is this: $$ \alpha_i [-y^{(i)}(w^Tx^{(i)} + b) + 1] = 0\,\,,\, i = 1,\ldots,m $$ In the course material and where I looked on the internet, it is said that all 5 constraints are satisfied in the dual problem stated above. But I don't see, why this constraint could not be violated in the optimization problem. My guess is that if it were violated it implies that the solution is not optimal, and the optimal solution always satisfies this, but I don't see way to prove it. Could somebody please shine some light on this and tell me what I'm missing? Thanks a lot.
EDIT: course materials: https://see.stanford.edu/materials/aimlcs229/cs229-notes3.pdf page 7-11