3

I am trying the create a new column in Pandas. I have an existing column of text and I want this new column to contain the value contained in another column where the row value partially matches the column name. It's similar to this thread but in my case I would only have a partial match:

Python: pandas: match row value to column name/ key's value

My table would look like this as an example

Field   UKA_rf4 UKB UKDdsdf 
UKA     1       4   54
UKB     2       5   7787    
UKD     97      54  765656  

And what I would like it to look like is:

Field   UKA_rf4 UKB UKDdsdf      Value
UKA     1       4   54           1
UKB     2       5   7787         5
UKD     97      54  765656       765656   

The first row results in a 1 because 'UKA' is contained in the column name 'UKA_rf4'

The second row matches exactly to a column name and the third column is a partial match.

Any suggestions?

lczapski
  • 3,838
  • 3
  • 13
  • 30
user2335564
  • 127
  • 1
  • 9

3 Answers3

2

If there is always match between columns and each value of Field use DataFrame.lookup:

s = df['Field']
a = df.columns.str.extract('(' + '|'.join(s) + ')', expand=False)
df['Value'] = df.set_axis(a, axis=1,inplace=False).lookup(df.index, s)
print (df)
  Field  UKA_rf4  UKB  UKDdsdf   Value
0   UKA        1    4       54       1
1   UKB        2    5     7787       5
2   UKD       97   54   765656  765656

Similar solution:

df = df.set_index('Field')
a = df.columns.str.extract('(' + '|'.join(s) + ')', expand=False)
df['Value'] = df.set_axis(a, axis=1,inplace=False).lookup(df.index, df.index)

But problem with lookup is if some values missing, so here is alternative with DataFrame.melt for reshape, extract columns by Series.str.extract with all values of Field column and Series.map:

df1 = df.melt('Field', var_name='var')
df1['cols'] = df1['var'].str.extract('(' + '|'.join(df['Field']) + ')', expand=False)
s = df1[df1['Field'] == df1['cols']].drop_duplicates('Field').set_index('Field')['value']

df['Value'] = df['Field'].map(s)
print (df)
  Field  UKA_rf4  UKB  UKdsdf  Value
0   UKA        1    4      54    1.0
1  UKB1        2    5    7787    NaN
2   UKD       97   54  765656    NaN
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
2

You can use list comprehension

df = pd.DataFrame({'Fields':['UKA','UKB','UKD'],'UKA_rf4':[1,2,97],'UKB':[4,5,54],'UKDdsdf':[54,7787,765656]})

df = df.set_index('Fields')
df['Values'] = [df[j][index] for index,i in enumerate(df.index) for j in df.columns if i in j]

print(df)

        UKA_rf4  UKB  UKDdsdf  Values
Fields                               
UKA           1    4       54       1
UKB           2    5     7787       5
UKD          97   54   765656  765656
Bugs 404
  • 69
  • 2
2

We can still use lookup by cleaning up your column names here:

df2 = df.copy()
df2.columns = ['Field'] + [col[:3] for col in df2.iloc[:, 1:].columns]
df['Value'] = df2.lookup(df2.index, df2['Field'])

  Field  UKA_rf4  UKB  UKDdsdf   Value
0   UKA        1    4       54       1
1   UKB        2    5     7787       5
2   UKD       97   54   765656  765656
Erfan
  • 36,997
  • 6
  • 53
  • 68