0

So let's say I have a pandas df with three columns which looks as folows:

episode  channel  characters
001      AMC      [   "WW",   "SG" ]
002      AMC      [   "SG",   "WW" ]
003      ABC      [   "JS",   "HR" ]
004      AMC      [   "WW",   "HS" ]

The characters variable is an array and the goal here is we want to make a boolean variable for each entry in characters.

We want for row 1, to have a WW column and a SG column. When these folks reappear in other rows, these columns will be populated with 0's and 1's as well.

Final Output:

episode  channel  characters          WW  SG  JS HR HS
001      AMC      [   "WW",   "SG" ]  1   1   0  0  0
002      AMC      [   "SG",   "WW" ]  1   1   0  0  0
003      ABC      [   "JS",   "HR" ]  0   0   1  1  0
004      AMC      [   "WW",   "HS" ]  1   0   0  0  1

It can also be characters_WW, characters_SG, etc...

Basically, this is one-hot encoding but with an array. :)

John Thomas
  • 914
  • 6
  • 17
  • can you provide your dataframe as dictionary (`df.to_dict('list')`) to avoid ambiguity on the types? – mozway May 16 '22 at 14:53

0 Answers0