Downside to scaling and centering?

Question

Bottom line up front: is there any reason not to center and scale continuous variables prior to model fitting for the sake of conducting model comparison?

I'm conducting a model comparison on a large data set (80,000 instances x 300 attributes) and I'm looking at predicting 250 different response values. If I compare 5 inducers- let's say cubist, boosted trees, random forest, MARS, and kNN- I'm already looking at 1,250 model fits without doing any parameter tuning (and counting the ensemble methods as a single fit). Although this is just the exploratory phase, I know that some models are sensitive to center and scaling (like kNN) and others aren't. Am I conducting my due diligence in comparing these models on an even playing field if I center and scale all of my numeric features so that I can use one predictor matrix for all 250 response vectors rather than mixing and matching? Or can some algorithms actually suffer from variables being transformed?

Note that I am not at all worried about interpretability of the resulting model.

I can imagine plenty of reasons--but they would depend on what the data mean, the specifics of the model, and why you are going through this exercise in the first place. Could you edit this post to focus it on the kind(s) of model(s) you are thinking of and the kinds of data you are concerned with? — whuber, Jul 01 '15 at 14:23
See When should you center your data & when should you standardize? - the answers there might help you to focus this question. — Scortchi - Reinstate Monica, Jul 01 '15 at 15:34
Thanks, I clarified my (apparently too general) question with the exact situation I'm in. — Nicholas Normandin, Jul 02 '15 at 15:57

score 1 · Answer 1 · answered Apr 13 '18 at 15:43

Yes, there are reasons. As I saw many questions about scaling, I ended up writing a small article about scaling here.

In short:

Scaling may be a pure waste of time (especially true for decision tree based methods).
Scaling may harm you performance (think about image classification)
When scaling, you have to pay extra attention to constant columns / NAs... (though this can be handled writing more tests)

Downside to scaling and centering?

1 Answers1