I am having a tough time as a newbie understanding the drop argument in OneHotEncoder. Does it drop the column with the non-numerical values of the categorical variables after it is done?
I am looking at the documentation and I don't get how we end up with a 2 x 3 array at the end.
>>> enc = OneHotEncoder(handle_unknown='ignore')
>>> X = [['Male', 1], ['Female', 3], ['Female', 2]]
...
>>> drop_enc = OneHotEncoder(drop='first').fit(X)
>>> drop_enc.categories_
[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]
>>> drop_enc.transform([['Female', 1], ['Male', 2]]).toarray()
array([[0., 0., 0.],
[1., 1., 0.]])
I was expecting 5 columns :
- 1 column for male,
- 1 column for female,
- 1 column for the 1 category
- 1 column for the 2 category
- 1 column for the 3 category