2

Consider the following ridge regression problem: minimize the loss function $$\sum_{i=1}^n ||y_i - w^T x_i||_2^2 + \lambda ||w||_2^2$$ with respect to the weight vector $w$. Taking derivative with respect to $w$, I get $$\sum_{i=1}^n 2(y_i - w^T x_i)(-x_i) + 2\lambda w$$ which implies $$w =(\sum_{i=1}^n (y_i - w^T x_i)(x_i)) / 2\lambda. $$ Is this wrong? I know that the solution is $$(X^TX - \lambda I)^{-1}X^Ty.$$

user10024395
  • 1
  • 2
  • 11
  • 21

3 Answers3

4

Your derivative is okay. Just remember to put all the $w$-terms on the same side of the equation $$\eqalign{ \sum_i x_i y_i &= \lambda w + \sum_i x_i x_i^Tw \cr }$$ Then pull $w$ out of the summation, since it's independent of $i$ $$\eqalign{ \sum_i y_i x_i &= \Big(\lambda I + \sum_i x_ix_i^T\Big)w \cr }$$ At this point, dispose of the summations in favor of matrix notation $$\eqalign{ X^Ty &= \big(\lambda I + X^TX\big)w \cr }$$ where $x_i$ is the $i^{th}$ column of $X,\,$ and $\,y_i$ is the $i^{th}$ component of $y$.

Tom
  • 56
1

Your solution has $w$ on both sides of the equation, and furthermore, $w$ is inside the summation, which is a problem.

I recommend taking a look at section 2 of Andrew Ng's CS 229 course notes. In short, using matrix math allows you to use properties of the gradient of a trace, which allows for a straightforward derivation. Note that his course notes are for standard least squares, but getting ridge regression is basically the same.

Mageek
  • 231
  • 1
  • Please don't have your answer rely on a link which may disappear at any time. Describe - or even quote - the essential parts (giving proper credit and including a complete reference if possible), so that when that link no longer works your answer doesn't become invalidated. 2. While I believe I understand exactly the sense in which your last sentence is intended, I doubt the OP nor many of our readers would see the immediate connection. Perhaps you could the details there (or perhaps link to some of the other answers on site which would be sufficient to give the right sense).
  • – Glen_b Jan 21 '16 at 06:00
  • Thanks for the helpful comment - much better than just downvoting me. At this point others have contributed good answers - I'll do it differently next time. – Mageek Jan 21 '16 at 17:57