I have two data frames df1 and df2 which look something like this.
cat1 cat2 cat3
0 10 25 12
1 11 22 14
2 12 30 15
all_cats cat_codes
0 10 A
1 11 B
2 12 C
3 25 D
4 22 E
5 30 F
6 14 G
I would like a DataFrame where each column in df1 is created but replaced with cat_codes. Column header names are different. I have tried join and merge but my number of rows are inconsistent. I am dealing with huge number of samples (100,000). My output should ideally be this:
cat1 cat2 cat3
0 A D C
1 B E Y
2 C F Z
The resulting columns should be appended to df1.
df2into a dictionary and using that to replace values in the data frame. I have a question: do you have other values in this dataframe that you don't want to replace, but take the same value as something inall_cats? For example, do you only want to replacecat_1,cat_2, andcat_3, but want to leavecat_4alone? If so, is any value incat_4equal to any value inall_cats? Let me know if I'm not making sense... – Stephen Witkowski Oct 17 '18 at 12:22all_cats. – Danny Oct 17 '18 at 12:39