So, I understand why simple linear or logistic regression will have infinite solutions in this case (good answers here and here). But while LASSO will only select n features, Elastic net does not have this limitation. This answer explains how regularization limits the potential solutions to a problem so that building a model can be possible. Is the same concept true of Elastic Net? If regularization limits the possible solutions, then how is the "final" solution chosen from that space?
Asked
Active
Viewed 93 times
2
kjetil b halvorsen
- 77,844
-
2Have a look at https://web.stanford.edu/~hastie/TALKS/enet_talk.pdf – Gabriel Romon May 21 '19 at 19:33
-
@GabrielRomon I looked through this but perhaps the section I need went over my head... where in here do they address the problem of selecting more predictors than observations? – Aidan Winters Jun 11 '19 at 15:36
-
A lot is said on page 9. For the details you definitely want to check the original paper. If something in the paper is unclear, don't hesitate to ask. – Gabriel Romon Jun 11 '19 at 16:17
-
With penalization the effective p is much lower than the apparent p. – Frank Harrell Sep 30 '23 at 15:44
-
Near duplicate: https://stats.stackexchange.com/questions/274225/why-regularization-shrinkage-method-works-for-pn#comment1170729_274225. – whuber Sep 30 '23 at 16:44
1 Answers
2
One way to look at this is that (as long as $\lambda_2\neq 0$) the L2 penalty is equivalent to adding $p$ examples: $$ \Vert y - X \beta \Vert^2 + \lambda_2 \Vert \beta \Vert^2 = \Vert \tilde y - \tilde X \beta \Vert^2 $$ with $$ \tilde X = \begin{bmatrix}X\\ \sqrt{\lambda_2} I_{p\times p}\end{bmatrix} \quad \tilde y = \begin{bmatrix}y\\ 0_{p\times 1}\end{bmatrix}. $$ So, in general, $n\times p$ ridge regression is equivalent to $(n+p)\times p$ non-regularised regression. Similarly $n\times p$ elastic-net regression is equivalent to $(n+p)\times p$ Lasso regression.
Luca Citi
- 1,336