0

I have two dataframe: df1 and df2.

df1 is following:

    name  exist
     a      1
     b      1
     c      1
     d      1
     e      1

df2 (just have one column:name)is following:

     name  
      e      
      f      
      g      
      a     
      h     

I want to merge these two dataframe, and didn't merge repeat names, I mean, if the name in df2 exist in df1, just show one time, else if the name is df2 not exist in df1, set the exist value is 0 or Nan. for example as df1(there is a and e), and df2(there is a and e, just showed a, e one time), I want to be the following df:

     a      1
     b      1
     c      1
     d      1
     e      1
     f      0 
     g      0
     h      0

I used the concat function to do it, my code is following:

import pandas as pd


df1 = pd.DataFrame({'name': ['a', 'b', 'c', 'd', 'e'],
                'exist': ['1', '1', '1', '1', '1']})
df2 = pd.DataFrame({'name': ['e', 'f', 'g', 'h', 'a']})
df = pd.concat([df1, df2])
print(df)

but the result is wrong(name a and e is repeated to be showed):

  exist name
 0     1    a
 1     1    b 
 2     1    c
 3     1    d
 4     1    e
 0   NaN    e
 1   NaN    f
 2   NaN    g
 3   NaN    h
 4   NaN    a

please give your hands, thanks in advance!

tktktk0711
  • 1,580
  • 4
  • 26
  • 55

1 Answers1

1

As indicated by your title, you can use merge instead of concat and specify how parameter as outer since you want to keep all records from df1 and df2 which defines an outer join:

import pandas as pd
pd.merge(df1, df2, on = 'name', how = 'outer').fillna(0)

# exist name
# 0   1    a
# 1   1    b
# 2   1    c
# 3   1    d
# 4   1    e
# 5   0    f
# 6   0    g
# 7   0    h
Psidom
  • 195,464
  • 25
  • 298
  • 322
  • thanks @Psidom for your answer, I will accept it. Could you tell me the meaning of how = 'outer' in detail, I mean the difference of left, right, inner, butter. Thanks! – tktktk0711 Aug 30 '16 at 03:34
  • Check this http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html and also this http://stackoverflow.com/questions/448023/what-is-the-difference-between-left-right-outer-and-inner-joins. Roughly, a join is a kind of match of tables based on some key columns; left keeps all records from the first data frame; right keeps all records from the second data frame; inner keeps only records that exists in both data frames and outer keeps all records. – Psidom Aug 30 '16 at 03:38
  • @tktktk0711, here is a visual guide to what is happening with DataFrame `merge` https://youtu.be/9d5-Ti6onew?t=13m7s – pylang Aug 30 '16 at 04:23