1

My dataframe has a multi index.There are up to 7 columns C1 to C7, but I have shown only C1 and C2 here. Under the column Correct_Category, it holds the name of the column to get the value from, and also contains NaN

index1  index2  C1      C2     Correct_Category   Matched
m1      a       10      0      C1                 True
        b       20      5      C2                 True
        c       10      10     C1                 True
m2      d       1       15     C2                 True
        e       18      10     NaN                False

My goal is to add a column "Matched", which gives the value based on the column name from Correct_Category.

index1  index2  C1      C2     Correct_Category   Matched  Number
m1      a       10      0      C1                 True     10
        b       20      5      C2                 True     5
        c       10      10     C1                 True     10
m2      d       1       15     C2                 True     15
        e       18      10     NaN                False    0

Everything I've tried so far, using .loc, .where, .apply, .query has returned different errors.

My current line of code:

df['Number']= df['Matched'].apply(lambda x: df[df['Correct_Category'] if x==True else 'No Category')

The df[df['Correct_Category'] is where i want to insert the code that can find the correct value, but this is the critical part that keeps returning different errors.

The current error is "ValueError: cannot index with vector containing NA/ NaN values"

Akshay
  • 43
  • 4
  • I believe the solution provided for the vectorized lookup does not deal with multi index as well as NaN values? – Akshay Jun 17 '19 at 04:01

1 Answers1

0

Use DataFrame.lookup but only for filtered rows:

mask = df['Correct_Category'].isin(df.columns)
#alternative 1
#mask = df['Correct_Category'].notna()
#alternative 2
#mask = df['Matched']

df.loc[mask, 'Number'] = df.loc[mask].lookup(df.index[mask], df.loc[mask, 'Correct_Category'])
print (df)
               C1  C2 Correct_Category  Matched  Number
index1 index2                                          
m1     a       10   0               C1     True    10.0
       b       20   5               C2     True     5.0
       c       10  10               C1     True    10.0
m2     d        1  15               C2     True    15.0
       e       18  10              NaN    False     NaN
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
  • Thank you, mask = df['Correct_Category'].isin(df.columns) worked for me while alternative 1 and 2 did not. You saved me after 3 hours of headache!! – Akshay Jun 17 '19 at 09:04