Questions tagged [encoding]

Encoding in machine learning and data science refers to the process by which non-numeric data is transformed into a numeric representation that can be fed into machine learning algorithms.

Encoding in machine learning and data science refers to the process by which non-numeric data is transformed into a numeric representation that can be fed into machine learning algorithms. An example is one-hot-encoding where categorical labels are transformed into a numeric format consisting of ones and zeros.

188 questions
7
votes
2 answers

Muti-hot encoding vs Label-Encoding

I am learning about different input-vector representations for Neural Networks One of the alternatives to sparse One-Hot encoded vector is the Multi-Hot encoding. Do I understand correctly that a traditional binary approach to counting numbers is…
Kari
  • 2,726
  • 2
  • 20
  • 49
3
votes
3 answers

One hot encoding with too many features (~ 10,000)

I am building a model to predict time off and sick leave for a specific employee. Each of the employees has one row per day from 01/01/2013 to 31/12/2018 in the dataset flagged with 0 or 1 (if that day was taken as a time off or sick day). I am…
Aggamarcel
  • 45
  • 5
2
votes
1 answer

why leave-one-out encoding?

This question is from this post asking about what is the difference between one-hot encoding and leave-one-out encoding. I now understand how to calculate the leave-one-out encoding for the training set. But how about the test set, where is Y…
user7117436
  • 298
  • 4
  • 11
1
vote
1 answer

Getting dummies for both train and test data

Should I apply pd.get_dummies() for both train and test data? And would it not result in data leakage?
0
votes
1 answer

Is label-encoded data quantitative or qualitative?

If you label-encode something which is qualitative, like brand of toothpaste or colour of hair, would you describe the resulting data as quantitative since it is now expressed in numbers? Or would you continue to describe the data as qualitative,…
nyptop
  • 11
0
votes
1 answer

Should I encode the categorical data before making a training validation split?

I am looking at some examples in kaggle and I'm not sure what is the correct approach. If I split the training data for training and validation and only encode the categorical data in the training part sometimes there are some unique values that are…
parse5214
  • 3
  • 1