0

I want to perform one hot encoding to one column in my data. The column may looks like this:

    app   
0   a       
1   b      
2   c      
3   a    

I've performed:

pd.get_dummies(df, columns=['app'])
    app_a   app_b   app_c
0     1        0    0
1     0        1    0
2     0        0    1
3     1        0    0

But in reality, the app column can contain 'd' value, in my data to train I don't have it. So what I want is to add app_d after perform get_dummies without 'd' value in my data.

Is there any code can one hot encoding form my simple data above to predefined columns? What I want looks like this:

 app_a  app_b app_c  app_d
     0    1     0   0    0
     1    0     1   0    0
     2    0     0   1    0
     3    1     0   0    0
funie200
  • 3,294
  • 5
  • 19
  • 31
Adiansyah
  • 181
  • 1
  • 10

1 Answers1

4

Try converting your column to pandas.Categorical dtype and specify the categories argument:

df['app'] = pd.Categorical(df['app'], categories=['a', 'b', 'c', 'd'])

pd.get_dummies(df['app'], prefix='app')

[out]

   app_a  app_b  app_c  app_d
0      1      0      0      0
1      0      1      0      0
2      0      0      1      0
3      1      0      0      0

Alternatively you could convert to Categorical type and use the cat.add_categories accessor method to update categories after the fact:

df['app'] = pd.Categorical(df['app'])

df['app'].cat.add_categories(['d'], inplace=True)
Chris Adams
  • 17,620
  • 4
  • 18
  • 35