In lasso or ridge regression, one has to specify a shrinkage parameter, often called by $\lambda$ or $\alpha$. This value is often chosen via cross validation by checking a bunch of different values on training data and seeing which yields the best e.g. $R^2$ on test data. What is the range of values one should check? Is it $(0,1)$?
-
2Possible duplicate of Choosing the range and grid density for regularization parameter in LASSO – Alex Oct 20 '16 at 23:05
-
1In fact, the optimal ridge parameter can be 0 or even negative. Some discussion: on stats.SE https://stats.stackexchange.com/questions/331264/understanding-negative-ridge-regression with a paper here https://arxiv.org/abs/1805.10939 – Sycorax Jul 28 '20 at 21:11
2 Answers
You don't really need to bother. In most packages (like glmnet) if you do not specify $\lambda$, the software package generates its own sequence (which is often recommended). The reason I stress this answer is that during the running of the LASSO the solver generates a sequence of $\lambda$, so while it may counterintuitive providing a single $\lambda$ value may actually slow the solver down considerably (When you provide an exact parameter the solver resorts to solving a semi definite program which can be slow for reasonably 'simple' cases.)
As for the exact value of $\lambda$ you can potentially chose whatever you want from $[0,\infty[$. Note that if your $\lambda$ value is too large the penalty will be too large and hence none of the coefficients can be non-zero. If the penalty is too small you will overfit the model and this will not be the best cross validated solution
- 2,617
-
5Hi Sid, the OP appears aware of the fact you mention in your post. It also does not appear to answer the question. :-) – cardinal Aug 15 '14 at 19:27
For those trying to figure this out:
I have found that there is a great difference between allowing glmnet to calculate $\lambda$, and for when we create a range for it to choose from (grid).
Here is an example using "applicants" in the College data set from ISLR
# Don't forget to set seed
set.seed(1)
train <- sample(1:dim(College)[1], 0.75*dim(College[1]))
Matrices
xmat.train <- model.matrix(Apps~.-1,data=College[train,])
xmat.test <- model.matrix(Apps~.-1, data= College[-train,])
y <- College$Apps[train]
Create a grid of values for the scope of lambda (optional):
grid <- 10 ^ seq(10,-2,length = 100)
Add the grid here as lambda (optional)
ridge.fit <- glmnet(xmat.train, y, alpha = 0, lambda=grid)
cv.ridge <- cv.glmnet(xmat.train, y, alpha =0, lambda=grid)
bestlam <- cv.ridge$lambda.min
cat("\nBestlam (with grid):",bestlam)
pred <- predict(ridge.fit, s = bestlam, newx= xmat.test)
cat("\nWith Grid:", mean((College$Apps[-train]-pred)^2))
Again but without the grid (allowing R to figure lambda out
ridge.fit <- glmnet(xmat.train, y, alpha = 0)
cv.ridge <- cv.glmnet(xmat.train, y, alpha =0)
bestlam <- cv.ridge$lambda.min
cat("\n\nBestlam (no grid):",bestlam)
pred <- predict(ridge.fit, s = bestlam, newx= xmat.test)
cat("\nWithout Grid:", mean((College$Apps[-train]-pred)^2))
You can run this yourself, and you can change grid accordingly as well, I've seen examples ranging from grid <- 10 ^ seq(10,-2,length = 100) to grid <- 10^seq(3, -2, by = -.1).
My best guess is that $\lambda$ can be restricted to certain values, and it is up to us in figuring out the most optimal range.
I have also found this guide quite helpful -> https://drsimonj.svbtle.com/ridge-regression-with-glmnet
- 11