One of the motivations for the elastic net was the following limitation of LASSO:
In the $p > n$ case, the lasso selects at most n variables before it saturates, because of the nature of the convex optimization problem. This seems to be a limiting feature for a variable selection method. Moreover, the lasso is not well defined unless the bound on the L1-norm of the coefficients is smaller than a certain value.
(http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2005.00503.x/full)
I understand that LASSO is a quadratic programming problem but also can be solved via LARS or element-wise gradient descent. But I do not understand where in these algorithms I encounter a problem if $p > n$ where $p$ is the number of predictors and $n$ is the sample size. And why is this problem solved using elastic net where I augment the problem to $p+n$ variables which clearly exceeds $p$.
Presentation at http://www.cs.cmu.edu/afs/cs/project/link-3/lafferty/www/ml-stat2/talks/YondaiKimGLasso-SLIDE-YD.pdf Paper (section 4) at http://datamining.dongguk.ac.kr/papers/GLASSO_JRSSB_V1.final.pdf
– user1137731 Sep 30 '12 at 21:53