Looking for an intuitive explanation, thanks.
Asked
Active
Viewed 1,380 times
0
-
5Possible duplicate of Why is ridge regression called "ridge", why is it needed, and what happens when $\lambda$ goes to infinity? or https://stats.stackexchange.com/questions/118712/why-does-ridge-estimate-become-better-than-ols-by-adding-a-constant-to-the-diago/119708#119708 – Sycorax Mar 01 '19 at 22:05
1 Answers
0
take the case of two perfectly correlated independent variables, x1 and x2 then the corresponding coefficients w1, w2 can go to +/- infty (by adjusting the other appropriately), and we have an infinite number of solutions.
adding L2 regularisation, means that of all these solutions (with same mean square error), there is a best solution - namely the one with smallest l2 norm. assuming we normalise the variables, this will have w1=w2, ie we take the average x1 and x2.
seanv507
- 6,743