I'm trying to write a full SVM implementation in Python and I have a few issues computing the Lagrange coefficients.
First let me rephrase what I understand from the algorithm to make sure I'm on the right path.
If $x_1, x_2, ..., x_n$ is a dataset and $y_i \in \{-1, 1\}$ is the class label of $x_i$, then $$\forall i, y_i(w^Tx_i + b) \geq 1$$
So we just need to solve an optimization problem to
minimize $\|w\|^2$
subject to $y_i(w^Tx_i + b) \geq 1$
In terms of Lagrange coefficients, this translates into finding $w$, $b$ and $\alpha=(\alpha_1, \alpha_2, ... \alpha_n) \neq0$ and $\geq0$ minimizing: $$L(\alpha, w, b) = \frac12 \|w\|^2 - \sum \alpha_i(y_i(w^Tx + b)-1)$$
Now since $$\frac{\partial L}{\partial w}=0 \implies w=\sum \alpha_i y_i x_i$$ and $$\frac{\partial L}{\partial b}=0 \implies \sum y_i \alpha_i = 0$$ we can rewrite it as $$L(\alpha, w, b) = Q(\alpha)=\sum \alpha_i - \frac12\sum \sum \alpha_i \alpha_j y_i y_j x_i^T x_j$$ with constraints $$\alpha_i \geq 0 \ \text{and} \ \sum \alpha_i y_i = 0$$
So I'm trying to solve the optimization problem using Python, and the only free package I could find is called cvxopt.
I'd like some help to solve this, I couldn't find any good example about this, and while I understand the theory, I'm having a hard time translating it into code (I would have expected the opposite since I'm more from a programming background).
Note that at some point I'll want to solve it using Kernels $$L(\alpha, w, b) = Q(\alpha)=\sum \alpha_i - \frac12\sum \sum \alpha_i \alpha_j y_i y_j K(x_i,x_j)$$ but I'm not sure what the implications are regarding solving this in code.
Any help would be greatly appreciated, I'm really lost on how to implement this in Python. If you have a better module to solve the optimization problem I'd like to read about it as well.