I am working with a lasso regression with the glmnet package. I read these threads: When conducting multiple regression, when should you center your predictor variables & when should you standardize them?, Need for centering and standardizing data in regression and Is standardisation before Lasso really necessary?.
Based on the responses I decided that I need to standardize my data before using it. I do have some questions however:
- Do I need to standardize the predictors and the responses or only the predictors?
- I am using the function scale(myData, center = TRUE, scale = TRUE) for building the model, but I am wondering what do I do when I want to do predictions with a test data set. I think I should also standardize and center the test data, but how to I do that? Substracting the mean from the initial (training) dataset and the dividing it by the standard deviation of the initial dataset?
- When I get a result do I need to "backscale" it (using the original mean and standard deviation) or do I already get the "final" result?
glmnetpackage), LASSO does not contemplate fixing $\lambda:$ it studies the solutions as a function of $\lambda.$ The rescaling--although it's correct--emerges as a mere reparameterization of the same solution, and therefore is of no consequence. – whuber Mar 12 '19 at 21:31glmnetnot fixing the $\lambda$ - I was mislead bysklearn, which defines LASSO asLinear Model trained with L1 prior as regularizer (aka the Lasso)and leaves $\lambda$ as a free parameter. I would still claim that the argument in the second part of the accepted answer is wrong. Also, reading theglmnetdocumentation, it seems that a sequence of automatically generated values is used for $\lambda$ and (depending on the options used) it's not clear in general whether you will find the same solution after scaling the targets. – christoph Mar 12 '19 at 21:58