compare two columns of pandas dataframe, if different use one of the correct one

Question

Have two dataframes, one contains ground_truth for cities, another one is read from other files randomly.

  ground_truth = pd.DataFrame(['New York','Denvor','Cleveland'],columns = ['cities'])
  random_df =  pd.DataFrame(['DenvoR','cleveland'],columns = ['cities'])

Need to compare two dataframes, compare random_df cities column with ground_truth cities column, change to the ground_truth cities if cases are messed up. So far I used for loop, it works but not elegant. Any suggestion?

Mines might not be that complicated. The cities names are always same, but cases might be messed up. — newleaf, Jun 04 '20 at 00:44

score 0 · Answer 1 · answered Jun 04 '20 at 00:47

0

Check with

s1=df.cities.str.upper()
s2=random_df.cities.str.upper()
df.loc[s1.isin(s2),'cities']=s1.map(dict(zip(s2,random_df.cities)))
df
      cities
0   New York
1     DenvoR
2  cleveland

answered Jun 04 '20 at 00:47

BENY

296,997
19
147
204

compare two columns of pandas dataframe, if different use one of the correct one

1 Answers1