31

Under what circumstances should the data be normalized/standardized when building a regression model. When i asked this question to a stats major, he gave me an ambiguous answer "depends on the data".

But what does that really mean? It should either be an universal rule or a check list of sorts where if certain conditions are met then the data either should/ shouldn't be normalized.

Raj
  • 943

2 Answers2

24

Sometimes standardization helps for numerical issues (not so much these days with modern numerical linear algebra routines) or for interpretation, as mentioned in the other answer. Here is one "rule" that I will use for answering the answer myself: Is the regression method you are using invariant, in that the substantive answer does not change with standardization? Ordinary least squares is invariant, while methods such as lasso or ridge regression are not. So, for invariant methods there is no real need for standardization, while for non-invariant methods you should probably standardize. (Or at least think it through).

The following is somewhat related: Dropping one of the columns when using one-hot encoding

7

It sometimes makes interpretation easier if you subtract the mean or some number within the range of the actual values as this can make the intercept more meaningful. For instance if you have people aged 65 and over subtract 65 and then the intercept is the predicted value for a 65-year old rather than a neonate. If you have non-linear terms like powers this makes them less correlated and so you can see more easily what is going on. It also may make life easier to scale the predictor so as to move the coefficients into a more printable range. For instance converting days into weeks or months. Other than that it should not matter. I suppose some of what I have just written may be what your friend meant by it depends on the data.

mdewey
  • 17,806
  • 4
    The correlation between predictors is unaffected by subtracting means, so nothing is less correlated than before. – Nick Cox Mar 16 '16 at 08:17
  • You are right @Nick, apologies. It is only non-linear terms for which it helps. – mdewey Mar 16 '16 at 11:32
  • "non-linear terms like powers this makes them less correlated" this is a general truth because to me it does sounds fine the connection between correlation and the non linearity. Would not a variable in a power be fully correlated with a response Y? I ask this because after this infer that the standardization in a ridged regression linear model sounds unnecessary. Thank you – nikolaosmparoutis May 10 '19 at 14:58
  • @NickCox: How about when dividing in case we want to standardize. Don't we get unitless coefficients? – MSIS Dec 19 '19 at 00:17