Pandas DataFrame Groupby two columns and get counts

Question

I have a pandas dataframe in the following format:

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']

df:

   col1 col2 col3     col4 col5
0   1.1    A  1.1    x/y/z    1
1   1.1    A  1.7      x/y    3
2   1.1    A  2.5  x/y/z/n    3
3   2.6    B  2.6      x/u    2
4   2.5    B  3.3        x    4
5   3.4    B  3.8    x/u/v    2
6   2.6    B    4    x/y/z    5
7   2.6    A  4.2        x    3
8   3.4    B  4.3  x/u/v/b    6
9   3.4    C  4.5        -    3
10  2.6    B  4.6      x/y    5
11  1.1    D  4.7    x/y/z    1
12  1.1    D  4.7        x    1
13  3.3    D  4.8  x/u/v/w    1

Now I want to group this by two columns like following:

df.groupby(['col5','col2']).reset_index()

OutPut:

             index col1 col2 col3     col4 col5
col5 col2                                      
1    A    0      0  1.1    A  1.1    x/y/z    1
     D    0     11  1.1    D  4.7    x/y/z    1
          1     12  1.1    D  4.7        x    1
          2     13  3.3    D  4.8  x/u/v/w    1
2    B    0      3  2.6    B  2.6      x/u    2
          1      5  3.4    B  3.8    x/u/v    2
3    A    0      1  1.1    A  1.7      x/y    3
          1      2  1.1    A  2.5  x/y/z/n    3
          2      7  2.6    A  4.2        x    3
     C    0      9  3.4    C  4.5        -    3
4    B    0      4  2.5    B  3.3        x    4
5    B    0      6  2.6    B    4    x/y/z    5
          1     10  2.6    B  4.6      x/y    5
6    B    0      8  3.4    B  4.3  x/u/v/b    6

I want to get the count by each row like following. Expected Output:

col5 col2 count
1    A      1
     D      3
2    B      2
etc...

How to get my expected output? And I want to find largest count for each 'col2' value?

A very similar question just came up yesterday.. see [here](http://stackoverflow.com/questions/17666075/python-pandas-groupby-result/17666287#17666287). — bdiamante, Jul 16 '13 at 14:29
Note on performance, including alternatives: [Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series](https://stackoverflow.com/questions/50328246/pandas-groupby-size-vs-series-value-counts-vs-collections-counter-with-multiple) — jpp, Jun 25 '18 at 14:01

Andy Hayden · Answer 1 · 2015-04-02T16:45:27.637

217

You are looking for size:

In [11]: df.groupby(['col5', 'col2']).size()
Out[11]:
col5  col2
1     A       1
      D       3
2     B       2
3     A       3
      C       1
4     B       1
5     B       2
6     B       1
dtype: int64

To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:

In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
Out[12]:
col2
A       3
B       2
C       1
D       3
dtype: int64

edited Apr 02 '15 at 16:45

answered Jul 16 '13 at 14:37

Andy Hayden

328,850
93
598
514

1

I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value? – Nilani Algiriyage Jul 16 '13 at 14:40

score 165 · Accepted Answer · answered Jul 16 '13 at 14:53

165

Followed by @Andy's answer, you can do following to solve your second question:

In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]: 
      0
col2   
A     3
B     2
C     1
D     3

answered Jul 16 '13 at 14:53

waitingkuo

80,738
23
108
117

1

Can I get "col5" values for this like C...1...3? – Nilani Algiriyage Jul 16 '13 at 15:12

score 43 · Answer 3 · edited Jun 26 '19 at 00:14

43

Idiomatic solution that uses only a single groupby

(df.groupby(['col5', 'col2']).size() 
   .sort_values(ascending=False) 
   .reset_index(name='count') 
   .drop_duplicates(subset='col2'))

  col5 col2  count
0    3    A      3
1    1    D      3
2    5    B      2
6    3    C      1

Explanation

The result of the groupby size method is a Series with col5 and col2 in the index. From here, you can use another groupby method to find the maximum value of each value in col2 but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2 with the drop_duplicates method.

edited Jun 26 '19 at 00:14

cs95

330,695
80
606
657

answered Nov 05 '17 at 19:37

Ted Petrou

52,577
18
119
125

1

There is no param called `name` in `reset_index()` in the current version of pandas: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html – mmBs Nov 25 '18 at 21:27
2

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.reset_index.html – Ted Petrou Nov 26 '18 at 11:20
Ok, my bad. I used it when working with `DataFrame` not `Series`. Thanks for the link. – mmBs Nov 26 '18 at 11:42
Note: `.drop_duplicates(subset='col2'))` is only to answer the second question: `And I want to find the largest count for each 'col2' value?` and, thus, will remove duplicates with smaller count values. – momo Mar 17 '21 at 14:16

score 30 · Answer 4 · edited Sep 20 '17 at 12:02

30

Inserting data into a pandas dataframe and providing column name.

import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
df.columns = [['Alphabet','Words']]
print(df)   #printing dataframe.

This is our printed data:

For making a group of dataframe in pandas and counter,
You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.

Like this:

df['COUNTER'] =1       #initially, set that counter to 1.
group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
print(group_data)

OUTPUT:

edited Sep 20 '17 at 12:02

Vaibhav Mule

4,771
3
34
52

answered Jul 21 '16 at 11:53

The Gr8 Adakron

1,122
1
12
14

11

How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ?? – blissweb Jan 06 '18 at 06:54
1

how to access the value of each group which is sum based on the alphabet and word? – Rahul Goyal Mar 05 '20 at 02:52
the column names should be assigned with a flat list: `df.columns = ['Alphabet','Words']` – Navid Aug 10 '20 at 08:20
to not get the gaps in the Alphabet column just add .reset_index() to the end, so `group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum().reset_index()` – JCM Sep 01 '20 at 18:12

score 5 · Answer 5 · answered Jun 06 '17 at 12:20

5

Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:

df.count_column=df.groupby(['col5','col2']).col5.transform('count')

(I picked 'col5' as it contains no nan)

answered Jun 06 '17 at 12:20

Tom

147
1
6

score 1 · Answer 6 · answered Feb 17 '22 at 09:32

Since pandas 1.1.0., you can value_counts on a DataFrame:

out = df[['col5','col2']].value_counts().sort_index()

Output:

col5  col2
1     A       1
      D       3
2     B       2
3     A       3
      C       1
4     B       1
5     B       2
6     B       1
dtype: int64

score -4 · Answer 7 · answered Dec 02 '16 at 02:22

-4

You can just use the built-in function count follow by the groupby function

df.groupby(['col5','col2']).count()

answered Dec 02 '16 at 02:22

seansio1995

15
1

Pandas DataFrame Groupby two columns and get counts

7 Answers7

Idiomatic solution that uses only a single groupby

Linked

Related