Questions tagged [feature-scaling]

153 questions
1
vote
2 answers

How do you get StandardScaler to work if X_test and X_train have different sizes?

For reference, the dataset I'm using is the Kaggle Housing prices dataset. The train data is (1460 x 80). I split it into a train and test dataset, with 1168 rows in the train set and 292 rows in the test dataset. So I would say it is sufficiently…
Katsu
  • 911
1
vote
0 answers

Reasons why we need to scale our variables (eg. with StandardScaler)

Trying to collect all the top reasons why we need to scale our independent variables in a ML model. I have 3 reasons that I've collected so far. Please lmk if I am missing any here. Correct for large nominal vars having a bigger impact to a…
Katsu
  • 911
1
vote
1 answer

When to scale data, if I have features of all numeric values?

i'am working on a case study, i'am having train data in which there are 45 columns out of which 28 are useful, case study is related to loan approval. all the columns in dataset are int64 format. and are in range as 14256 to 168956 1587 to 3456 10…
0
votes
0 answers

How to scale test data that might go beyond the scaling range of the train data

What is best practice for scaling test data that goes beyond the scaling range of the train data ? particularly if you are using minmax scaling ? For e.g. I minmax the following between [0,1]: train_data = [2,4,7,0,12,4,5] train_data_scaled =…
AnarKi
  • 565
0
votes
0 answers

What scaling to choose for data that is always >0? Does it matter?

If I have something like stock price or income that cannot be negative, what scaling should I pursue? Is there research that suggests that centering it at 0 will cause issues? I'd assume so but I wonder if this is proven somewhere? Thank you in…
Nicklovn
  • 891
  • 9
  • 24