0

This is similar to reversing one-hot encoding, but I have multiple columns that might be labeled.

I have this:

|col1|col2|
|1   |0   |
|0   |1   |
|1   |1   |

I want this:

|col1|col2|new        |
|1   |0   |'col1'     |
|0   |1   |'col2'     |
|1   |1   |'col1_col2'|

Here is what I tried:

df.idxmax(axis=1)

It only returns the first instance and will not capture rows that have multiple 1s

def get_cat(row):
    temp = []
    for c in df[codes].columns:
        if row[c]==1:
            return c   

This does the same thing: it only returns the first column name and misses rows with multiple columns having a 1.

NLR
  • 1,392
  • 2
  • 8
  • 20

1 Answers1

1

Use this

def get_cat(row):
    temp = [a for a, b in row.items() if b == 1]

    return '_'.join(temp)

row is a pandas.Series.

Elmex80s
  • 3,398
  • 1
  • 14
  • 22