0

I have 2 dataframe: df1 and df2,

len(df1)=167
len(df2)=3047

I want to merge df2 into df1,it means the length of new new_df should be the same of df1.

 new_df = pd.merge(df1, df2,
                  on=['header_name'], how="left")

I used:

 how=left

I think this is the right way,but I don't know why the length of new_df= 574 it is larger than df1.

The output should be:

len(new_df)=len(df1)
William
  • 2,621
  • 5
  • 32
  • 62
  • You've misunderstood `left` join. It only guarantees that all the keys from the left DataFrame will be present in the result. Notice when duplicate keys in df2: `pd.merge(pd.DataFrame({'header_name': [1], 'a': [0]}), pd.DataFrame({'header_name': [1, 1], 'a': [1, 2]}), on=['header_name'], how="left")` The condition that all keys `1` from the left frame appear in the resulting DataFrame, but it appears twice since there are duplicates in `df2`. – Henry Ecker Aug 16 '21 at 23:51

0 Answers0