Tune alpha and lambda parameters of elastic nets in an optimal way

Question

I am trying to tune alpha and lambda parameters for an elastic net based on the glmnet package. I found some sources, which propose different options for that purpose. According to this instruction I did an optimization based on the caret package. According to this thread I optimized the parameters manually. Both ways give me valid results, but however, the chosen parameters of the methods are very different. See reproducible example in R below:

library("caret")
library("glmnet")

set.seed(1234)

# Some example data
N <- 1000
y <- rnorm(N, 5, 10)
x1 <- y + rnorm(N, 2, 10)
x2 <- y + rnorm(N, - 5, 20)
x3 <- y + rnorm(N, 10, 200)
x4 <- rnorm(N, 20, 50)
x5 <- rnorm(N, - 7, 200)
x6 <- rbinom(N, 1, exp(x1) / (exp(x1) + 1))
x7 <- rbinom(N, 1, exp(x2) / (exp(x2) + 1))
x8 <- rbinom(N, 1, exp(x3) / (exp(x3) + 1))
x9 <- rbinom(N, 1, exp(x4) / (exp(x4) + 1))
x10 <- rbinom(N, 1, exp(x5) / (exp(x5) + 1))

data <- data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)

# Tune parameteres with caret and glmnet

# Set up grid and cross validation method for train function
lambda_grid <- seq(0, 3, 0.1)
alpha_grid <- seq(0, 1, 0.1)

trnCtrl <- trainControl(method = "repeatedCV",
                    number = 10,
                    repeats = 5)

srchGrid <- expand.grid(.alpha = alpha_grid, .lambda = lambda_grid)

# Cross validation
my_train <- train(y ~., data,
              method = "glmnet",
              tuneGrid = srchGrid,
              trControl = trnCtrl)

# Best tuning parameters
my_train$bestTune


# Tune parameteres with glmnet only

alphasOfInterest <- seq(0, 1, 0.1)

# Step 1: Do all crossvalidations for each alpha
cvs <- lapply(alphasOfInterest, function(curAlpha) {
  cv.glmnet(x = as.matrix(data[ , colnames(data) %in% "y" == FALSE]), 
        y = y, alpha = curAlpha, family = "gaussian")
})

# Step 2: Collect the optimum lambda for each alpha
optimumPerAlpha <- sapply(seq_along(alphasOfInterest), function(curi) {
  curcvs <- cvs[[curi]]
  curAlpha <- alphasOfInterest[curi]
  indOfMin <- match(curcvs$lambda.min, curcvs$lambda)
  c(lam = curcvs$lambda.min, alph = curAlpha, cvup = curcvs$cvup[indOfMin])
})

# Step 3: Find the overall optimum
posOfOptimum <- which.min(optimumPerAlpha["lam", ])
overall.lambda.min <- optimumPerAlpha["lam", posOfOptimum]
overall.alpha.min <- optimumPerAlpha["alph", posOfOptimum]
overall.criterionthreshold <- optimumPerAlpha["cvup", posOfOptimum]

# Step 4: Now check for each alpha which lambda is the best within the threshold
corrected1se <- sapply(seq_along(alphasOfInterest), function(curi) {
  curcvs <- cvs[[curi]]
  lams <- curcvs$lambda
  lams[lams < overall.lambda.min] <- NA
  lams[curcvs$cvm > overall.criterionthreshold] <- NA
  lam1se<-max(lams, na.rm = TRUE)
  c(lam = lam1se, alph = alphasOfInterest[curi])
})

# Step 5: Find the best (lowest) of these lambdas
overall.lambda.1se <- max(corrected1se["lam", ])
pos <- match(overall.lambda.1se, corrected1se["lam", ])
overall.alpha.1se <- corrected1se["alph", pos]


# Comparison --> Parameters are very different
my_train$bestTune # Parameters according to caret
c(overall.alpha.1se, overall.lambda.1se) # Parameters according to glmnet only

It seems like I am doing something wrong, but unfortunately I can not figure out the problem. Question: How could I tune alpha and lambda for an elastic net in R?

UPDATE: Simulation study added for a comparison between caret and a manual tuning of alpha and lambda

According to Hong Ooi's suggestion, I compared the results of both tuning methods in several runs within a small simulation study. Both methods still result in very different best parameters and the manual tuning outperforms the caret package slightly. This result is very surprising to me, since I would have expected that the caret package results in better estimations compared to a programming by hand. Therefore I am wondering, if the manual tuning is actually outperforming the caret package or if I have done any mistakes. Any suggestion is very welcome!

##### Small simulation #####


alpha_caret <- numeric()
lambda_caret <- numeric()
MSE_caret <- numeric()
alpha_without_caret <- numeric()
lambda_without_caret <- numeric()
MSE_without_caret <- numeric()

R <- 20 # Simulation runs

for(r in 1:R) {


  ##### Tune parameteres with caret and glmnet #####


  # Set up grid and cross validation method for train function
  lambda_grid <- seq(0, 3, 0.1)
  alpha_grid <- seq(0, 1, 0.1)

  trnCtrl <- trainControl(method = "repeatedCV",
                      number = 10,
                      repeats = 5)

  srchGrid <- expand.grid(.alpha = alpha_grid, .lambda = lambda_grid)

  # Cross validation
  my_train <- train(y ~., data,
                method = "glmnet",
                tuneGrid = srchGrid,
                trControl = trnCtrl)

  # Best parameters
  alpha_caret[r] <- as.numeric(my_train$bestTune[1]) # alpha according to caret
  lambda_caret[r] <- as.numeric(my_train$bestTune[2]) # lambda according to caret

  # Elastic net with best parameters
  mod_elnet <- glmnet(x = as.matrix(data[colnames(data) %in% "y" == FALSE]), 
                   y = data$y, alpha = alpha_caret[r], family = "gaussian", lambda = lambda_caret[r])

  # Estimation of lm with the variables that have been selected in the elastic net
  vars_elnet <- names(mod_elnet$beta[ , 1])[as.numeric(mod_elnet$beta[ , 1]) != 0]
  mod_elnet_lm <- lm(y ~ ., data[ , colnames(data) %in% c(vars_elnet, "y")])

  # MSE
  MSE_caret[r] <- mean(mod_elnet_lm$residuals^2)


  ##### Tune parameteres with glmnet only #####


  alphasOfInterest <- seq(0, 1, 0.1)

  # Step 1: Do all crossvalidations for each alpha
  cvs <- lapply(alphasOfInterest, function(curAlpha) {
cv.glmnet(x = as.matrix(data[ , colnames(data) %in% "y" == FALSE]), 
          y = y, alpha = curAlpha, family = "gaussian")
  })

  # Step 2: Collect the optimum lambda for each alpha
  optimumPerAlpha <- sapply(seq_along(alphasOfInterest), function(curi) {
curcvs <- cvs[[curi]]
curAlpha <- alphasOfInterest[curi]
indOfMin <- match(curcvs$lambda.min, curcvs$lambda)
c(lam = curcvs$lambda.min, alph = curAlpha, cvup = curcvs$cvup[indOfMin])
  })

  # Step 3: Find the overall optimum
  posOfOptimum <- which.min(optimumPerAlpha["lam", ])
  overall.lambda.min <- optimumPerAlpha["lam", posOfOptimum]
  overall.alpha.min <- optimumPerAlpha["alph", posOfOptimum]
  overall.criterionthreshold <- optimumPerAlpha["cvup", posOfOptimum]

  # Step 4: Now check for each alpha which lambda is the best within the threshold
  corrected1se <- sapply(seq_along(alphasOfInterest), function(curi) {
curcvs <- cvs[[curi]]
lams <- curcvs$lambda
lams[lams < overall.lambda.min] <- NA
lams[curcvs$cvm > overall.criterionthreshold] <- NA
lam1se<-max(lams, na.rm = TRUE)
c(lam = lam1se, alph = alphasOfInterest[curi])
  })

  # Step 5: Find the best (lowest) of these lambdas
  overall.lambda.1se <- max(corrected1se["lam", ])
  pos <- match(overall.lambda.1se, corrected1se["lam", ])
  overall.alpha.1se <- corrected1se["alph", pos]

  # Best parameters
  alpha_without_caret[r] <- as.numeric(overall.alpha.1se) # alpha according to glmnet only
  lambda_without_caret[r] <- as.numeric(overall.lambda.1se) # lambda according to glmnet only

  # Elastic net with best parameters
  mod_elnet_wc <- glmnet(x = as.matrix(data[colnames(data) %in% "y" == FALSE]), 
                  y = data$y, alpha = alpha_without_caret[r], family = "gaussian", lambda = lambda_without_caret[r])

  # Estimation of lm with the variables that have been selected in the elastic net
  vars_elnet_wc <- names(mod_elnet_wc$beta[ , 1])[as.numeric(mod_elnet_wc$beta[ , 1]) != 0]
  mod_elnet_wc_lm <- lm(y ~ ., data[ , colnames(data) %in% c(vars_elnet_wc, "y")])

  # MSE
  MSE_without_caret[r] <- mean(mod_elnet_wc_lm$residuals^2)
}


# Compare results
data.frame(alpha_caret, lambda_caret, MSE_caret, alpha_without_caret, lambda_without_caret, MSE_without_caret)

mean(MSE_caret)
mean(MSE_without_caret) # Better results

The results look like follows:

   alpha_caret lambda_caret MSE_caret alpha_without_caret lambda_without_caret MSE_without_caret
1          0.9          0.2  40.28436                 0.0             1.850340          40.14838
2          1.0          0.2  40.28436                 0.4             1.228666          40.48928
3          1.0          0.2  40.28436                 0.0             1.850340          40.14838
4          1.0          0.2  40.28436                 0.2             1.693744          40.23916
5          1.0          0.2  40.28436                 0.0             2.030746          40.14838
6          1.0          0.2  40.28436                 0.2             1.858882          40.36684
7          1.0          0.2  40.28436                 0.0             2.684526          40.14838
8          1.0          0.2  40.28436                 0.1             2.127441          40.16517
9          0.7          0.1  40.16302                 0.1             1.766239          40.16011
10         1.0          0.2  40.28436                 0.1             2.127441          40.16517
11         0.7          0.1  40.16302                 0.0             1.536185          40.14838
12         1.0          0.2  40.28436                 0.2             2.239030          40.36684
13         1.0          0.2  40.28436                 0.1             1.938445          40.16011
14         1.0          0.2  40.28436                 0.1             2.127441          40.16517
15         1.0          0.2  40.28436                 0.1             2.334864          40.16517
16         0.9          0.2  40.28436                 0.1             2.127441          40.16517
17         0.8          0.1  40.16302                 0.2             1.543276          40.22040
18         1.0          0.2  40.28436                 0.1             2.562510          40.22040
19         1.0          0.2  40.28436                 0.0             2.946264          40.14838
20         1.0          0.2  40.28436                 0.1             1.938445          40.16011

The MSE of a programming by hand is better like the MSE based on the caret package. The estimated best alphas and lambdas are very different.

Question: Why do both methods result in such different estimations of alpha and lambda?

Hong Ooi · Answer 1 · 2018-09-11T14:49:53.647

13

Cross-validation is a noisy process and you shouldn't expect the results from two runs to be similar, even if everything is working fine. You can try repeating your experiment several times and see what happens.

That said, here's a narrow answer to this specific question:

Question: How could I tune alpha and lambda for an elastic net in R?

My glmnetUtils package includes a function cva.glmnet to do exactly this. It does cross-validation for both alpha and lambda, with the validation folds held constant (as per the recommendation in ?cv.glmnet).

Sample code:

# it also includes a formula interface: no more messing around with model.matrix()
cva <- cva.glmnet(y ~ ., data=data)

edited Sep 11 '18 at 14:49

answered Mar 21 '17 at 14:24

Hong Ooi

8,249

Thank you very much for your fast response! I am just trying to install your package, if I do install.packages("glmnetUtils") it says "package ‘glmnetUtils’ is not available (for R version 3.3.3)" and if I try install.packages("devtools"); library(devtools); install_github("hong-revo/glmnetUtils") it says "Error: Couldn't connect to server". Is the package updated for R version 3.3.3? Could you also comment on the difference between caret and glmnetUtils? What are the differences regarding the tuning of alpha and lambda of these 2 packages? – Joachim Schork Mar 21 '17 at 14:41
It's not on CRAN (yet; working on that). There shouldn't be any issues installing with devtools. Make sure you haven't got access to Github blocked. – Hong Ooi Mar 21 '17 at 14:59
Probably it is block from the institute I am working for. I will try to figure out what is going on. Thank you very much! – Joachim Schork Mar 21 '17 at 15:37
1

Unfortunately I am still not able to run your package (probably due to limitations of my company). However, I have had a look at your suggestion to run both methods several times. Like you can see in the update above, it seems like there is a systematic between the different results of both ways. Further, it seems like the programming by hand is outperforming the caretpackage slightly, which is very surprising to me. Do you have any suggestions, why this could be the case? – Joachim Schork Mar 23 '17 at 09:55

Tune alpha and lambda parameters of elastic nets in an optimal way

1 Answers1

Linked