I was working through the lab on ridge regression and LASSO in ISLR and I came across a strange behavior in the cv.glmnet function. When I followed the lab as written I got the following
set.seed(1)
train <- sample(1:nrow(x), nrow(x)/2)
test <- (-train)
y.test <- y[test]
set.seed(1)
cv.out <- cv.glmnet(x[train,], y[train], lambda=grid, alpha=0)
plot(cv.out)
bestlam <- cv.out$lambda.min
bestlam
[1] 231.013
For my own benefit I tried it using a different seed (8675309) and got back a different result. Any combination of setting the seeds resulted in different answers. I am assuming this has to do with how the 10-folds are changed with the different seeds, however the different lambda.min can vary so much I am concerned the package might not be stable. Am I missing something?
lambda.minfor the different runs/seeds match - have similar included variables or coefs? Have you tried thelambda.1seoption, which is the simplest model within 1 standard error of the best (lambda.min), which may be more stable, esp if the MSE is relatively flat around the "best" model. – Gavin Simpson Jan 15 '14 at 05:17lambda.1seis not stable, but the coefficients are similar. At least the same variables have the larger coefficients. I think I might close this question, as it might just be a product of the model. – Fraijo Jan 15 '14 at 16:12coef(cv.out, [s='lambda.1se'])will vary. This is a problem. – smci Feb 24 '17 at 00:01