I have a large dataset (1,465 observations, >50,000 predictors). I conducted a LASSO regression using glmnet in order to perform variable selection, then I ran a simple logistic regression model to test associations with the 15 predictors that were left with non-zero coefficients after LASSO.
After the simple logistic regression model, I wanted to conduct multiple test correction, so I did, using the Benjamini-Hochberg Procedure. However, I've been told this is improper, as it is "double-dipping," so to speak. The p-values are supposed to be derived directly from the LASSO regression (which can then be corrected). However, I don't know how to get those p-values.
I've tried the islasso package, but I run into a stack overflow issue. Running RStudio from the command line to increase max-ppsize as well as increasing "R_MAX_VSIZE" and options(expressions = 5e5), I still run into the issue.
How can I calculate these p-values and correct them? I have the proper lambda value, which is 0.0412638 (obtained through elastic net cross-validation).
Any help you all can provide is greatly appreciated.
Thank you for reading!