How to calculate (and conduct multiple test correction) for LASSO

Question

I have a large dataset (1,465 observations, >50,000 predictors). I conducted a LASSO regression using glmnet in order to perform variable selection, then I ran a simple logistic regression model to test associations with the 15 predictors that were left with non-zero coefficients after LASSO.

After the simple logistic regression model, I wanted to conduct multiple test correction, so I did, using the Benjamini-Hochberg Procedure. However, I've been told this is improper, as it is "double-dipping," so to speak. The p-values are supposed to be derived directly from the LASSO regression (which can then be corrected). However, I don't know how to get those p-values.

I've tried the islasso package, but I run into a stack overflow issue. Running RStudio from the command line to increase max-ppsize as well as increasing "R_MAX_VSIZE" and options(expressions = 5e5), I still run into the issue.

How can I calculate these p-values and correct them? I have the proper lambda value, which is 0.0412638 (obtained through elastic net cross-validation).

Any help you all can provide is greatly appreciated.

Thank you for reading!

For more about the problems obtaining a p-value with the Lasso, see https://stats.stackexchange.com/search?q=lasso+p+value. — whuber, Apr 04 '23 at 22:47

How to calculate (and conduct multiple test correction) for LASSO

0 Answers0