3

I've been performing a groupby operation on a dataframe I have that aggregates columns together based on the column 'Name':

Name | As | Bs | Cs   |  Note
Mark   3     4     7     Good
Luke   2     1     12    Well
Mark   5     6     8     Ok
John   1     18    3     Great

So in this case, the rows with 'Mark' are aggregated together on columns A,B, and C using this code:

temp_df = temp_df.groupby(['Name'], as_index=False).agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})

A thing I need to add in is to do a count on the number of rows that have the same value in 'Name'. This would give me an output like:

Name | As | Bs | Cs   |  Note   | Count
Mark   8     10    15    Good      2
Luke   2     1     12    Well      1
John   1     18    3     Great     1

How do I modify the above line of code to do what I need?

GreenGodot
  • 5,280
  • 9
  • 31
  • 61

2 Answers2

4

Create the group and do your aggregations:

the_group = temp_df.groupby(['Name'], as_index=False)
temp_df = the_group.agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})

then compute the size from the_group

temp_df['count'] = the_group.count()['Note']

gives:

   Name  Cs  As  Bs  count
0  John   3   1  18      1
1  Luke  12   2   1      1
2  Mark  15   8  10      2

Edit:

As suggested in the comments, it is safer to use size() in case the data include NaN:

temp_df['count'] = the_group.size().reset_index()[0] 
VinceP
  • 1,808
  • 1
  • 17
  • 29
  • I believe you meant to say 'temp_df['count'] = the_group.count()['Name']'? WHich is what I want to get. THis does not seem to be working though, I'm just getting back the string value for name for each row. – GreenGodot Jul 06 '17 at 16:02
  • Count is used for check non NaN values, need size. – jezrael Jul 06 '17 at 16:27
1

Use first + size and then is necessary rename columns by dict:

temp_df = temp_df.groupby('Name', sort=False) \
                .agg({'As':np.sum,'Bs':np.sum,'Cs':np.sum,'Note':'first','Name':'size'}) \
                .rename(columns={'Name':'Count'}) \
                .reset_index() \
                .reindex_axis(temp_df.columns.tolist() + ['Count'], axis=1) 
print (temp_df)
   Name  As  Bs  Cs   Note  Count
0  Mark   8  10  15   Good      2
1  Luke   2   1  12   Well      1
2  John   1  18   3  Great      1

Dont use count, only size or len.

What is the difference between size and count in pandas?

Graham
  • 7,035
  • 17
  • 57
  • 82
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
  • 1
    Since pandas 0.25.0+ you can use named aggregation. temp_df.groupby('Name', sort=False).agg(As=('As','sum'),Bs=('Bs','sum'),Cs=('Cs','sum'),Note=('Note','first'),Count=('Name','size')) – Matthew Son Jan 16 '20 at 21:12