284

I am trying to join two pandas data frames using two columns:

new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')

but got the following error:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: '[B_1, c2]'

Any idea what should be the right way to do this? Thanks!

Edamame
  • 20,574
  • 59
  • 165
  • 291
  • 83
    `left_on` and `right_on` should be a list of strings, not a string that looks like a list. – root Jan 23 '17 at 20:34
  • Simple typo: `left_on=['A_c1','c2']` instead of `'[A_c1,c2]'`. It needs a list of string, as @root said. Similarly `right_on = ['B_c1','c2']`. – smci Aug 24 '21 at 06:14
  • By the way it's a bad practice to have the columns of dataframe `A_df` be named starting with a prefix 'A_', and the columns from `B_df` named `B_...`. It's totally unnecessary, and it makes basic operations like joins, merges, groupbys etc. annoying. – smci Aug 24 '21 at 06:16

4 Answers4

502

Try this

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

left_on : label or list, or array-like Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns

right_on : label or list, or array-like Field names to join on in right DataFrame or vector/list of vectors per left_on docs

Community
  • 1
  • 1
Shijo
  • 8,307
  • 2
  • 16
  • 30
  • 43
    If `left_on` and `right_on` are same `a` and `b`, can we use `on = ['a', 'b']`? – ah bon Dec 13 '19 at 07:11
  • 17
    Yes that is perfectly valid. – user3065757 Dec 15 '19 at 04:02
  • 8
    For those wondering like me, it will merge according to the sequence/ordering of `left_on` and `right_on`, i.e., the _i_-th element of `left_on` will match with the _i_-th of `right_on`. It would be nice if the docs state that more explicitly. – Kardo Paska Dec 02 '20 at 00:58
  • 2
    Note that whenever the join keys have different names, all names will occur as columns in the merged table. E.g., `left_on='[A_c1, c2]', right_on='[B_c1, c2]')` will result in three columns: `A_c1`, `B_c1` and `c2`, where `A_c1` and `B_c1` are identical columns. – DustByte Jul 07 '21 at 15:11
13

the problem here is that by using the apostrophes you are setting the value being passed to be a string, when in fact, as @Shijo stated from the documentation, the function is expecting a label or list, but not a string! If the list contains each of the name of the columns beings passed for both the left and right dataframe, then each column-name must individually be within apostrophes. With what has been stated, we can understand why this is inccorect:

new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')

And this is the correct way of using the function:

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])
Celius Stingher
  • 14,458
  • 5
  • 18
  • 47
8

Another way of doing this:

new_df = A_df.merge(B_df, left_on=['A_c1','c2'], right_on = ['B_c1','c2'], how='left')
john ed
  • 161
  • 1
  • 6
0

you can use below which is short and simple to understand:

merged_data= df1.merge(df2, on=["column1","column2"])
Ali karimi
  • 63
  • 5