On standardizing features for glmnet during a nested CV

Question

I have noticed from some of the questions posted here, who also post their line of code, that they have set the standardize option of glmnet to "FALSE".

It seems that users either:
1) just use the default standardize=TRUE and let glmnet automatically standardize the variables prior to fitting the model sequence,
2) or set to standardize=FALSE and rescale the variables themselves.

I have tried to track how cvglmnet codes implement variable standardization but have not been able to find out how it is done.

How the variables are standardized in the code are especially of interest to me because I am running glmnet on nested cross validation, which means that I would have to constantly rescale my variables each time my training set and test set changes, correct? (Rescale both training data and test data to have values ranging from 0 to 1 based on the min and max of the training data).

To this end, I have been trying to edit parts of the glmnet code...so that the inputs to the glmnet, cvglmnet, lognet, cvlognet functions will be the rescaled values calculated according to the current training set's min and max. I am wondering whether it is right to do so, and am wondering how are other glmnet users standardizing their variables?

This is flagged as off-topic. I dont understand why, seems perfectly on-topic to mee, how/why standardize is a statistical question, not about software. — kjetil b halvorsen, Apr 03 '17 at 13:08
I think you should be aware that glmnet is aimed at sparse matrices (for memory efficiency)- typically if you subtract the mean ( along a column) from a sparse matrix then it will no longer be sparse -leading to out of memory issues... so I suspect the scaling is performed within the algorithm ie in the fortran code rather than as an initial step. — seanv507, Apr 03 '17 at 16:05
I agree with @kjetilbhalvorsen that this question should not be flagged as a duplicate. The other question only talks about software and none of the answers provide a solution to what the appropriate method for re-scaling features across training and test sets should look like. This is a legitimate question. — user3303, Apr 22 '17 at 19:27
Thank you @user3303 for your input on this. None of the other posts have really answered my question and I hope this question will be reopened soon. — Michelle, Apr 26 '17 at 10:41

On standardizing features for glmnet during a nested CV

0 Answers0