25

I have issues with the merging of two large Dataframes since the merge returns NaN values though there are fitting values. The two dfs are shaped like:

df1

Motor
2232
1524
2230
2230
2224
1516
1724
2224
1524
1624
1724
2224
2224
1524
1524
1516
1524
2224
1624
1724
1724
2224
2224

df2

Motor   Output Torque (mNm)
0615    0,17
1219    0,72
1516    0,59
1624    2
2230    4,7
2233    5,9
0816    0,7
1016    0,92
1024    1,6
1224    1,7
1319    1,4
1331    3,8
1516    0,97
1524    2,9
1717    2,2
1724    4,5
2224    6,8
2232    10
1336    3,6
1727    4,9
1741    8,8
2237    12
2642    26

I use the code:

MergeDat=MergeDat.merge(Motor,how="left")
print(MergeDat)

where MergeDat= df1 and Motor= df2

As result it returns:

  Motor  Output Torque (mNm)
0      2232                  NaN
1      1524                  NaN
2      2230                  NaN
3      2230                  NaN
4      2224                  NaN
5      1516                  NaN
6      1724                  NaN
7      2224                  NaN
8      1524                  NaN
9      1624                  NaN
10     1724                  NaN
11     2224                  NaN
12     2224                  NaN
13     1524                  NaN
14     1524                  NaN
15     1516                  NaN
16     1524                  NaN
17     2224                  NaN
18     1624                  NaN
19     1724                  NaN
20     1724                  NaN
21     2224                  NaN
22     2224                  NaN
23     1524                  NaN
24     1724                  NaN
25     1841                  NaN
26     2224                  NaN

I have no idea why the Output Torque column is not merged...

Appreciate any help!

2Obe
  • 3,202
  • 5
  • 26
  • 52
  • 1
    Could you add `df.dtypes` to the question? Suspecting if later is string column and former is numeric type. – Zero Oct 15 '17 at 11:17
  • I had differnt dtypes for the "Motor" column. Fixed this issue by applying the answer of [jezrael](https://stackoverflow.com/users/2901002/jezrael) – 2Obe Oct 15 '17 at 11:24

3 Answers3

40

You need same dtype of joined columns:

#convert first or second to str or int
MergeDat['Motor'] = MergeDat['Motor'].astype(str)
#Motor['Motor'] = Motor['Motor'].astype(str)

#MergeDat['Motor'] = MergeDat['Motor'].astype(int)
Motor['Motor'] = Motor['Motor'].astype(int)

#convert first or second to str or int
#MergeDat['Motor'] = MergeDat['Motor'].astype(str)
Motor['Motor'] = Motor['Motor'].astype(str)

MergeDat['Motor'] = MergeDat['Motor'].astype(int)
#Motor['Motor'] = Motor['Motor'].astype(int)


MergeDat=MergeDat.merge(Motor,how="left")
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
  • 34
    Would be great if pandas would print a warning when dtypes are different. – Soren Apr 13 '18 at 12:10
  • @Sören - Yes, maybe in future version of pandas. Btw, last of improvement of merge is [this](https://pandas.pydata.org/pandas-docs/stable/merging.html#checking-for-duplicate-keys()), but same dtypes still not check :( – jezrael Apr 13 '18 at 12:15
  • I would like to add that I experienced an issue trying to merge columns that were of `object` type together. I had to case them as `str` in order for their join to work. – gr1zzly be4r Sep 26 '18 at 17:59
  • 6
    In pandas 0.25.1, I'm having this exact problem, but `.astype(str)` on both columns (`object` containing DOIs) does not change anything. – Katrin Leinweber Oct 14 '19 at 14:10
  • @KatrinLeinweber - hmmm, object means strings, so try convert to numeric - [link](https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas) – jezrael Oct 14 '19 at 14:11
  • Thanks for the link :-) DOIs are unique strings (e.g. `10.1016/j.ribaf.2019.101069`), though, so neither numeric works (`ValueError`) nor are categorical types appropriate. However, when I cast the colum which `NaN`-ed after the merge to `astype("category")` beforehand, the values were preserved. Very strange... – Katrin Leinweber Oct 14 '19 at 14:59
  • There is a chance if you merge `left` and there are NO common values, it'll look like it found matching records, but you'll have NaN's on the right dataframe because it basically did a `pd.concat(axis=1)` since it didn't find anything to merge. – kevin_theinfinityfund Mar 23 '21 at 05:29
3

In my case, it was because I haven't reset the index after splitting the data frame, using df.reset_index(drop=True). Resetting the index of the first data frame enabled merging a second data frame to it.

Pavindu
  • 2,104
  • 5
  • 28
  • 60
0

Having some NaN's in the key column(s) is the usual culprit from my experience. Try at least the 2nd of these 3 lines on both df's (where unique_id is the key column used for merging) and see if it helps:

print(df[unique_id].duplicated().sum())
df.drop_duplicates(subset=unique_id, inplace=True)
assert(df[unique_id].duplicated().sum() == 0)
mirekphd
  • 2,771
  • 21
  • 35