Mapping column values of one DataFrame to another DataFrame using a key with different header names

Question

I have two data frames df1 and df2 which look something like this.

    cat1    cat2  cat3
0   10       25     12  
1   11       22     14
2   12       30     15

   all_cats  cat_codes
0   10       A     
1   11       B 
2   12       C
3   25       D
4   22       E
5   30       F
6   14       G

I would like a DataFrame where each column in df1 is created but replaced with cat_codes. Column header names are different. I have tried join and merge but my number of rows are inconsistent. I am dealing with huge number of samples (100,000). My output should ideally be this:

    cat1    cat2  cat3
0    A        D     C  
1    B        E     Y
2    C        F     Z

The resulting columns should be appended to df1.

score 8 · Accepted Answer · answered Oct 16 '18 at 16:10

8

You can convert df2 to a dictionary and use that to replace the values in df1

cat_1 = [10, 11, 12]
cat_2 = [25, 22, 30]
cat_3 = [12, 14, 15]

df1 = pd.DataFrame({'cat1':cat_1, 'cat2':cat_2, 'cat3':cat_3})

all_cats = [10, 11, 12, 25, 22, 30, 15]
cat_codes = ['A', 'B', 'C', 'D', 'E', 'F', 'G']

df2 = pd.DataFrame({'all_cats':all_cats, 'cat_codes':cat_codes})

rename_dict = df2.set_index('all_cats').to_dict()['cat_codes']

df1 = df1.replace(rename_dict)

If you still have some values that aren't in your dictionary and want to replace them with Z, you can use a regex to replace them.

df1.astype('str').replace({'\d+': 'Z'}, regex=True)

answered Oct 16 '18 at 16:10

Stephen Witkowski

291
1
5

Thank you for your response. I want to create columns but not replace them and these data frames are of high cardinality which means cat_1,cat_2 and cat_3 are not the only columns in the data frame. Of course, I can convert these columns into lists and use your solution but I am looking for an elegant way of doing this. Do you think 'joins' would help? – Danny Oct 17 '18 at 08:44
Just to be clear, you wouldn't need to convert these columns into lists. You're simply changing df2 into a dictionary and using that to replace values in the data frame. I have a question: do you have other values in this dataframe that you don't want to replace, but take the same value as something in all_cats? For example, do you only want to replace cat_1, cat_2, and cat_3, but want to leave cat_4 alone? If so, is any value in cat_4 equal to any value in all_cats? Let me know if I'm not making sense... – Stephen Witkowski Oct 17 '18 at 12:22
Yes. You are right. I want to leave the other columns alone but the other columns may or may not match the values in all_cats. – Danny Oct 17 '18 at 12:39

score 6 · Answer 2 · answered Dec 19 '18 at 13:44

6

df3 = pd.merge(df1,df2,left_on=['cat'+str(i)], right_on = ['cat_codes'], how = 'left')

I would iterate this for cat1,cat2 and cat3. This does not replace the existing column values but appends new columns.

answered Dec 19 '18 at 13:44

Danny

1,148
1
8
16

Thanks! This answer helped – Tuhin Mitra Mar 23 '22 at 09:02

Mapping column values of one DataFrame to another DataFrame using a key with different header names

2 Answers2