Pandas: creating new column based on column value partially matching another column's name

Question

I am trying the create a new column in Pandas. I have an existing column of text and I want this new column to contain the value contained in another column where the row value partially matches the column name. It's similar to this thread but in my case I would only have a partial match:

Python: pandas: match row value to column name/ key's value

My table would look like this as an example

Field   UKA_rf4 UKB UKDdsdf 
UKA     1       4   54
UKB     2       5   7787    
UKD     97      54  765656

And what I would like it to look like is:

Field   UKA_rf4 UKB UKDdsdf      Value
UKA     1       4   54           1
UKB     2       5   7787         5
UKD     97      54  765656       765656

The first row results in a 1 because 'UKA' is contained in the column name 'UKA_rf4'

The second row matches exactly to a column name and the third column is a partial match.

Any suggestions?

jezrael · Answer 1 · 2019-05-23T12:57:06.197

If there is always match between columns and each value of Field use DataFrame.lookup:

s = df['Field']
a = df.columns.str.extract('(' + '|'.join(s) + ')', expand=False)
df['Value'] = df.set_axis(a, axis=1,inplace=False).lookup(df.index, s)
print (df)
  Field  UKA_rf4  UKB  UKDdsdf   Value
0   UKA        1    4       54       1
1   UKB        2    5     7787       5
2   UKD       97   54   765656  765656

Similar solution:

df = df.set_index('Field')
a = df.columns.str.extract('(' + '|'.join(s) + ')', expand=False)
df['Value'] = df.set_axis(a, axis=1,inplace=False).lookup(df.index, df.index)

But problem with lookup is if some values missing, so here is alternative with DataFrame.melt for reshape, extract columns by Series.str.extract with all values of Field column and Series.map:

df1 = df.melt('Field', var_name='var')
df1['cols'] = df1['var'].str.extract('(' + '|'.join(df['Field']) + ')', expand=False)
s = df1[df1['Field'] == df1['cols']].drop_duplicates('Field').set_index('Field')['value']

df['Value'] = df['Field'].map(s)
print (df)
  Field  UKA_rf4  UKB  UKdsdf  Value
0   UKA        1    4      54    1.0
1  UKB1        2    5    7787    NaN
2   UKD       97   54  765656    NaN

This is genius. Thank you! – user2335564 May 23 '19 at 11:07 — user2335564, May 23 '19 at 11:07

score 2 · Answer 2 · answered May 23 '19 at 11:05

You can use list comprehension

df = pd.DataFrame({'Fields':['UKA','UKB','UKD'],'UKA_rf4':[1,2,97],'UKB':[4,5,54],'UKDdsdf':[54,7787,765656]})

df = df.set_index('Fields')
df['Values'] = [df[j][index] for index,i in enumerate(df.index) for j in df.columns if i in j]

print(df)

        UKA_rf4  UKB  UKDdsdf  Values
Fields                               
UKA           1    4       54       1
UKB           2    5     7787       5
UKD          97   54   765656  765656

This is also great as I was looking attempting this via a list comprehension but was stuck — user2335564, May 23 '19 at 11:45

score 2 · Answer 3 · answered May 23 '19 at 11:11

We can still use lookup by cleaning up your column names here:

df2 = df.copy()
df2.columns = ['Field'] + [col[:3] for col in df2.iloc[:, 1:].columns]
df['Value'] = df2.lookup(df2.index, df2['Field'])

  Field  UKA_rf4  UKB  UKDdsdf   Value
0   UKA        1    4       54       1
1   UKB        2    5     7787       5
2   UKD       97   54   765656  765656

Pandas: creating new column based on column value partially matching another column's name

3 Answers3