2

So, I understand why simple linear or logistic regression will have infinite solutions in this case (good answers here and here). But while LASSO will only select n features, Elastic net does not have this limitation. This answer explains how regularization limits the potential solutions to a problem so that building a model can be possible. Is the same concept true of Elastic Net? If regularization limits the possible solutions, then how is the "final" solution chosen from that space?

1 Answers1

2

One way to look at this is that (as long as $\lambda_2\neq 0$) the L2 penalty is equivalent to adding $p$ examples: $$ \Vert y - X \beta \Vert^2 + \lambda_2 \Vert \beta \Vert^2 = \Vert \tilde y - \tilde X \beta \Vert^2 $$ with $$ \tilde X = \begin{bmatrix}X\\ \sqrt{\lambda_2} I_{p\times p}\end{bmatrix} \quad \tilde y = \begin{bmatrix}y\\ 0_{p\times 1}\end{bmatrix}. $$ So, in general, $n\times p$ ridge regression is equivalent to $(n+p)\times p$ non-regularised regression. Similarly $n\times p$ elastic-net regression is equivalent to $(n+p)\times p$ Lasso regression.

Luca Citi
  • 1,336