0

I would like to break down a pandas column consisting of a list of elements into as many columns as there are unique elements i.e.

ID    col1.    col2      col3
01.    ele1.    ele2     [P00]
02     ele3     ele4     [P01,P04]
03     ele5     ele6     [P00,P03]
04     ele7     ele8     [P01,P03]

expected :

ID     col1     col2    P00      P01    P03   P04.    
01     ele1     ele2     1        0      0     0
02     ele3     ele4     0        1      0     1
03     ele5     ele6     1        0      1     0  
04     ele7     ele8     0        1      1     0

I tried below code but I am getting P different, 0 different, [ in different columns. I want one element as a column name(feature) of that specific column as mentioned above.

My code

Try1 :

mlb = MultiLabelBinarizer(sparse_output=True)
df = df.join(
            pd.DataFrame.sparse.from_spmatrix(
                mlb.fit_transform(df.pop('Product_Holding_B2')),
                index=df.index,
                columns=mlb.classes_)) 

Try2 : (This worked good but till first P00 feature, after that columns came as P01,P04 like both in one column)

s = df['Product_Holding_B2'].explode()

df[['col1','col2']].join(pd.crosstab(s.index, s))

Any one can Help Please ?

  • Does [this](https://stackoverflow.com/questions/45312377/how-to-one-hot-encode-from-a-pandas-column-containing-a-list) help? Starting from the original df, `df1 = df['col3'].explode()`, `df[['ID', 'col1', 'col2']].join(pd.crosstab(df1.index, df1))` is one way to make it work. – amiola Nov 14 '21 at 10:19

0 Answers0