0

Have two dataframes, one contains ground_truth for cities, another one is read from other files randomly.

  ground_truth = pd.DataFrame(['New York','Denvor','Cleveland'],columns = ['cities'])
  random_df =  pd.DataFrame(['DenvoR','cleveland'],columns = ['cities'])

Need to compare two dataframes, compare random_df cities column with ground_truth cities column, change to the ground_truth cities if cases are messed up. So far I used for loop, it works but not elegant. Any suggestion?

newleaf
  • 1,913
  • 7
  • 27
  • 44

1 Answers1

0

Check with

s1=df.cities.str.upper()
s2=random_df.cities.str.upper()
df.loc[s1.isin(s2),'cities']=s1.map(dict(zip(s2,random_df.cities)))
df
      cities
0   New York
1     DenvoR
2  cleveland
BENY
  • 296,997
  • 19
  • 147
  • 204