Data annotation, dataframe merge

Question

I need to annotate data (a liste of names in a column) with the content of a column of a second dataframe (containing some atributes of these names). Like dataframe 1 :

id      name             col
----------------------------
 29834   Marie Peer
890384   Marie Peach
 30047   Susan Peer

with an other dataframe containing

name       city
---------------------
Marie      Chicago
Marie      Boston 
Harley     Milan 
Josh       London

Questions:

Shall I better use pandas or does it worth to create python dictionaries?
I previously used data = pd.merge(df1, df2, on='column_name', how='left') but how does it work if my second dataframe has multiple times the same row in the column (I would need them to go in the same row of a new column of the first dataframe).

I would need that, if 'Marie' has multiple entries in the second dataframe, that annotate in a new column the multiple entries of 'Marie' from the second df output needed :
```
 29834 Marie Peer      Chicago, Boston
890384 Marie Peach     Chicago, Boston
 30047 Susan Peer
```
(and if there is no entry in the second dataframe, like for 'Susan' here, the field would stay blank)
In general is that the best way to annotate a dataset with data from an other dataset ?

Many thanks in advance (and apologies, i'm a beginner though that might sound very basic questions) !

Hi, and welcome to Stack Overflow. please visit https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Itamar Mushkin, Oct 29 '19 at 06:27

score 0 · Accepted Answer · answered Oct 29 '19 at 06:17

Use DataFrame.groupby with join for second DataFrame, so possible use DataFrame.join:

df = df1.join(df2.groupby('name')['city'].apply(', '.join), on='name')
print (df)
       id   name    col             city
0   29834  Marie   Peer  Chicago, Boston
1  890384  Marie  Peach  Chicago, Boston
2   30047  Susan   Peer              NaN

Or Series.map:

df1['city'] = df1['name'].map(df2.groupby('name')['city'].apply(', '.join))
print (df1)
       id   name    col             city
0   29834  Marie   Peer  Chicago, Boston
1  890384  Marie  Peach  Chicago, Boston
2   30047  Susan   Peer              NaN

Detail:

print (df2.groupby('name')['city'].apply(', '.join))
name
Harley              Milan
Josh               London
Marie     Chicago, Boston
Name: city, dtype: object

Data annotation, dataframe merge

1 Answers1