Lasso centering and standarization with R

Question

I am working with a lasso regression with the glmnet package. I read these threads: When conducting multiple regression, when should you center your predictor variables & when should you standardize them?, Need for centering and standardizing data in regression and Is standardisation before Lasso really necessary?.

Based on the responses I decided that I need to standardize my data before using it. I do have some questions however:

Do I need to standardize the predictors and the responses or only the predictors?
I am using the function scale(myData, center = TRUE, scale = TRUE) for building the model, but I am wondering what do I do when I want to do predictions with a test data set. I think I should also standardize and center the test data, but how to I do that? Substracting the mean from the initial (training) dataset and the dividing it by the standard deviation of the initial dataset?
When I get a result do I need to "backscale" it (using the original mean and standard deviation) or do I already get the "final" result?

christoph · Answer 1 · 2019-03-12T22:53:39.100

In general, you are right to worry about scaling the responses. If you optimize a function of the kind that LASSO is based on,

$$ \min_{\beta} || Y - X\beta ||_{2}^{2} + \lambda || \beta ||_1, $$

then scaling the response $Y$ with some constant $\alpha$,

$$ \min_{\beta} || \alpha Y - X\beta ||_{2}^{2} + \lambda || \beta ||_1 $$

leads to

$$ \min_{\beta} \alpha^2 || Y - X\beta / \alpha ||_{2}^{2} + \lambda || \beta ||_1. $$

Notice the square on the $\alpha$, which is missing in the accepted answer. Dividing by $\alpha^2$ leads to $$ \min_{\beta} || Y - X\beta / \alpha ||_{2}^{2} + (\lambda / \alpha) || (\beta / \alpha) ||_1, $$

which describes a solution that will perform differently than a solution of the original problem since effectively the regularization constant changed.

However, if you look at the documentation for glmnet, you will find that the default behavior is to use a log spaced grid for $\lambda$ and choose a range for the grid based on standardized (!) responses and predictors.

So the answer to your first question is, based on the glmnet documentation: If you use default values for all parameters, standardizing the response should not have a big effect on performance since the $\lambda$ grid is chosen using standardized responses anyway. For non-default parameters (e.g., using less values in the $\lambda$ grid by setting nlambda to a low value), it might have a larger effect and you have to be careful. Also, this does not carry over to, e.g., sklearn, where no grid is used and changing the response scale might drastically affect performance.

The answer to your second and third questions is that by default, the package standardizes the data before using it and rescales the coefficients before returning them.

You make an interesting point--welcome to CV. As far as I understand it (as implemented and documented in the glmnet package), LASSO does not contemplate fixing $\lambda:$ it studies the solutions as a function of $\lambda.$ The rescaling--although it's correct--emerges as a mere reparameterization of the same solution, and therefore is of no consequence. — whuber, Mar 12 '19 at 21:31
@whuber You are right about glmnet not fixing the $\lambda$ - I was mislead by sklearn, which defines LASSO as Linear Model trained with L1 prior as regularizer (aka the Lasso) and leaves $\lambda$ as a free parameter. I would still claim that the argument in the second part of the accepted answer is wrong. Also, reading the glmnet documentation, it seems that a sequence of automatically generated values is used for $\lambda$ and (depending on the options used) it's not clear in general whether you will find the same solution after scaling the targets. — christoph, Mar 12 '19 at 21:58
Good points. In my experience, the automatically generated values tend to do a good job at the low end, but sometimes need to be extended at the high end. The answers will be different but only because of the randomization employed during cross-validation. I see you're reacting primarily to the claim "has the same value of $\lambda,$" which you have clearly shown is incorrect. I'm sure your post will get some votes for closure on the basis that it doesn't answer the question, but if you could edit in some remarks to address the question, it would be more likely to survive. — whuber, Mar 12 '19 at 22:03

score 11 · Accepted Answer · edited Feb 27 '17 at 09:31

If you use glmnet, the scaling is performed by the package. You don't need to worry about scaling the test set because the "coefficients are always returned on the original scale".

By default:

glmnet(x, y, [...]
standardize = TRUE,
intercept = TRUE,
standardize.response = FALSE [...])

As for the standardization of the response, it should not change the performance of your model after cross validating over $\lambda$ so you can set standardize.response = FALSE

Indeed the LASSO solves

$$ \min_\beta\; \| Y - X\beta \|^2_2 + \lambda \|\beta\|_1 $$

Scaling $Y$ by a factor $\alpha > 0$, the problem becomes

$$ \min_\beta\; \| \alpha Y - X\beta \|^2_2 + \lambda \|\beta\|_1 $$

which is equivalent to

$$ \min_\beta\; \alpha \| Y-X\beta/\alpha \|^2_2 + \lambda \|\beta\|_1 $$

$$ \min_\beta\; \| Y - X\beta/\alpha \|^2_2 + \lambda \|\beta/\alpha\|_1 $$

So it has the same value of $\lambda$

You are missing a square on the alpha in the third equation. — christoph, Nov 13 '20 at 09:05

score 3 · Answer 3 · answered Sep 21 '15 at 03:27

With a lasso regression, standardization is essential. That's because lasso finds the best solution subject to a constraint on the absolute value of the sum of the coefficients. If one didn't scale the coefficients the answer would totally depend on the scaling of the coefficient. For example using lasso on $x_1, x_2 $ as opposed to $x_1, y=\frac{1}{10000} x_2$ would give very different answers. With the second set of variables, the coefficient of y is almost guaranteed to be zero with lasso. Check the glmnet help, I seem to recall that it will automatically scale the data

Lasso centering and standarization with R

3 Answers3

Linked

Related