4

I ran Ridge and Lasso regressions using an algorithm to automatically find the optimum lambda.

However, the algorithm couldn't find an optimum lambda between 0 and 1. In some cases I could find optimum lambdas that were a lot higher than 1 (sometimes 4 or 5 or even higher).

What does that exactly mean? I always read that the optimum lambda is mostly just a little higher than 0 and definitely not higher than one.

Does it mean that ridge and lasso aren't applicable in that case?

Thx a lot in advance, Tobias

Stephan Kolassa
  • 123,354
Toby_Shoby
  • 303
  • 5
  • 16
  • 5
    When you write $\lambda$, you are referring to the tuning parameter, right? Can you refer to a link where you read "the optimum lambda is mostly just a little higher than 0 and definitely not higher than one", because I have never run into such a thing. A higher optimum lambda indicates larger sparsity, and there is no reason to get $\lambda < 1$. – Greenparker May 11 '16 at 15:49
  • Thx, that helps a lot. – Toby_Shoby May 11 '16 at 16:18

1 Answers1

12

Let's work with the lasso. Recall how a lasso regression model is fitted, given $\lambda$:

$$\min_{\beta\in\mathbb{R}^p}\left\{\frac{1}{N}\|y-X\beta\|_2^2-\lambda\|\beta\|_1\right\}$$

The first part of the summand gives the mean squared 2-norm of the residuals. The second part gives the 1-norm of the parameter vector (typically not including the intercept entry $\beta_0$).

There is no reason whatsoever these two components should be comparable in magnitude. Your model could fit very well, yielding small residuals, but need large parameters. Or the other way around. Plus, you may or may not first standardize your predictors, which will change the parameter estimates.

This applies to the estimate for $\beta$, given $\lambda$. Now, if you optimize $\lambda$, perhaps using cross-validation, this means that a priori you cannot say anything about the likely range of $\lambda$, other than $\lambda\geq 0$.

TL;DR: you appear to have misremembered. The optimum $\lambda$ in no way needs to be in some specific interval. Therefore, getting a "surprising" value does not tell you anything about the appropriateness (or not) of your lasso model.

The same of course applies to ridge regression or the elastic net.

Stephan Kolassa
  • 123,354