How does regularized regression overcome the p > n problem?

Question

So, I understand why simple linear or logistic regression will have infinite solutions in this case (good answers here and here). But while LASSO will only select n features, Elastic net does not have this limitation. This answer explains how regularization limits the potential solutions to a problem so that building a model can be possible. Is the same concept true of Elastic Net? If regularization limits the possible solutions, then how is the "final" solution chosen from that space?

Have a look at https://web.stanford.edu/~hastie/TALKS/enet_talk.pdf — Gabriel Romon, May 21 '19 at 19:33
@GabrielRomon I looked through this but perhaps the section I need went over my head... where in here do they address the problem of selecting more predictors than observations? — Aidan Winters, Jun 11 '19 at 15:36
A lot is said on page 9. For the details you definitely want to check the original paper. If something in the paper is unclear, don't hesitate to ask. — Gabriel Romon, Jun 11 '19 at 16:17
With penalization the effective p is much lower than the apparent p. — Frank Harrell, Sep 30 '23 at 15:44
Near duplicate: https://stats.stackexchange.com/questions/274225/why-regularization-shrinkage-method-works-for-pn#comment1170729_274225. — whuber, Sep 30 '23 at 16:44

score 2 · Answer 1 · answered Sep 30 '23 at 14:01

One way to look at this is that (as long as $\lambda_2\neq 0$) the L2 penalty is equivalent to adding $p$ examples: $$ \Vert y - X \beta \Vert^2 + \lambda_2 \Vert \beta \Vert^2 = \Vert \tilde y - \tilde X \beta \Vert^2 $$ with $$ \tilde X = \begin{bmatrix}X\\ \sqrt{\lambda_2} I_{p\times p}\end{bmatrix} \quad \tilde y = \begin{bmatrix}y\\ 0_{p\times 1}\end{bmatrix}. $$ So, in general, $n\times p$ ridge regression is equivalent to $(n+p)\times p$ non-regularised regression. Similarly $n\times p$ elastic-net regression is equivalent to $(n+p)\times p$ Lasso regression.

How does regularized regression overcome the p > n problem?

1 Answers1