The lasso coefficients are the ones that minimize $RSS+\lambda \Sigma_{j=1}^{p} |\beta_j|$ whereas the ridge regression coefficients those that minimize $RSS+\lambda \Sigma_{j=1}^{p} \beta_j^2$. I don't quite see from these mathematical expressions alone why lasso will shrink some coefficients to be exactly zero, while ridge regression will shrink all the coefficients towards zero, but will not set any of them equal to zero. Can someone explain or provide some intuition for this please. Thank you.
Asked
Active
Viewed 19 times
2
-
Have you tried looking this up in a textbook? It is not unusual to explain this in a machine learning textbook that covers lasso and ridge. ISL or ESL might have a discussion of this. – Richard Hardy Jun 23 '23 at 14:14
-
@RichardHardy: I did read the explanation in ISL; it just wasn't that satisfying. – ColorStatistics Jun 23 '23 at 14:24
-
1@Firebug: thank you for the link; it is exactly my question; I will digest it now. – ColorStatistics Jun 23 '23 at 14:24
-
2Check https://stats.stackexchange.com/q/587492/35989 – Tim Jun 23 '23 at 14:25