why does lasso select at most n predictors?

Question

From the seminal paper on elastic net regularization from Zou and Hastie 2005, I read

For this kind of
p>>n and grouped variables situation, the lasso
is not the ideal method, because it can only select at most
n variables out of p candidates (Efron et al., 2004).

However, in Efron et al., 2004 I can not fin the proof/demonstration? Any hint?

score 4 · Accepted Answer · answered Jan 08 '19 at 10:59

Consider a linear model $Y = X\beta + \varepsilon$ with $p$ variables and $n$ observations, $p>n$. Assuming the variables are not linear dependent, i.e. the matrix $X$ has rank $n$, $Y$ can be perfectly predictet ($Y = \hat{Y}$) using only $n$ variables. So LASSO will ideally choose the $n$ variables such that $\lambda ||\beta||_1$ is minimal. This solution should be unique (because of linear independence on the individual variable level) and threrefore, all perfect fits of the linear model that include more than $n$ variables will have a higher $\lambda ||\beta||_1$ and are therefore not optimal.

why does lasso select at most n predictors?

1 Answers1

Linked