So let's say I have a pandas df with three columns which looks as folows:
episode channel characters
001 AMC [ "WW", "SG" ]
002 AMC [ "SG", "WW" ]
003 ABC [ "JS", "HR" ]
004 AMC [ "WW", "HS" ]
The characters variable is an array and the goal here is we want to make a boolean variable for each entry in characters.
We want for row 1, to have a WW column and a SG column. When these folks reappear in other rows, these columns will be populated with 0's and 1's as well.
Final Output:
episode channel characters WW SG JS HR HS
001 AMC [ "WW", "SG" ] 1 1 0 0 0
002 AMC [ "SG", "WW" ] 1 1 0 0 0
003 ABC [ "JS", "HR" ] 0 0 1 1 0
004 AMC [ "WW", "HS" ] 1 0 0 0 1
It can also be characters_WW, characters_SG, etc...
Basically, this is one-hot encoding but with an array. :)