Why is L2 regression good for handling multicollinearity?

Question

Looking for an intuitive explanation, thanks.

Possible duplicate of Why is ridge regression called "ridge", why is it needed, and what happens when $\lambda$ goes to infinity? or https://stats.stackexchange.com/questions/118712/why-does-ridge-estimate-become-better-than-ols-by-adding-a-constant-to-the-diago/119708#119708 — Sycorax, Mar 01 '19 at 22:05

score 0 · Answer 1 · answered Mar 01 '19 at 22:21

take the case of two perfectly correlated independent variables, x1 and x2 then the corresponding coefficients w1, w2 can go to +/- infty (by adjusting the other appropriately), and we have an infinite number of solutions.

adding L2 regularisation, means that of all these solutions (with same mean square error), there is a best solution - namely the one with smallest l2 norm. assuming we normalise the variables, this will have w1=w2, ie we take the average x1 and x2.

Why is L2 regression good for handling multicollinearity?

1 Answers1