Python: get a frequency count based on two columns (variables) in pandas dataframe some row appers

Question

Hello I have the following dataframe.

    Group           Size

    Short          Small
    Short          Small
    Moderate       Medium
    Moderate       Small
    Tall           Large

I want to count the frequency of how many time the same row appears in the dataframe.

    Group           Size      Time

    Short          Small        2
    Moderate       Medium       1 
    Moderate       Small        1
    Tall           Large        1

Note on performance, including alternatives: [Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series](https://stackoverflow.com/questions/50328246/pandas-groupby-size-vs-series-value-counts-vs-collections-counter-with-multiple) — jpp, Jun 25 '18 at 14:02

score 179 · Accepted Answer · answered Oct 22 '15 at 00:44

179

You can use groupby's size:

In [11]: df.groupby(["Group", "Size"]).size()
Out[11]:
Group     Size
Moderate  Medium    1
          Small     1
Short     Small     2
Tall      Large     1
dtype: int64

In [12]: df.groupby(["Group", "Size"]).size().reset_index(name="Time")
Out[12]:
      Group    Size  Time
0  Moderate  Medium     1
1  Moderate   Small     1
2     Short   Small     2
3      Tall   Large     1

answered Oct 22 '15 at 00:44

Andy Hayden

328,850
93
598
514

7

Thanks. One minor addition to pick the top k (=20) values based on the frequency ("Time"): df.groupby(["Group", "Size"]).size().reset_index(name="Time").sort_values(by='Time',ascending=False).head(20); – Dileep Kumar Patchigolla Dec 18 '17 at 12:40
1

Just notice that using `.size()` will returns Series while `.size().reset_index(name="Time")` is a DataFrame. Thanks Andy. – alemol Nov 23 '19 at 17:14
or you could also do `df.groupby(by=["Group", "Size"], as_index=False).size()` simply – Naveen Reddy Marthala Apr 25 '20 at 04:18

BENY · Answer 2 · 2020-10-14T13:15:10.827

79

Update after pandas 1.1 value_counts now accept multiple columns

df.value_counts(["Group", "Size"])

You can also try pd.crosstab()

Group           Size

Short          Small
Short          Small
Moderate       Medium
Moderate       Small
Tall           Large

pd.crosstab(df.Group,df.Size)


Size      Large  Medium  Small
Group                         
Moderate      0       1      1
Short         0       0      2
Tall          1       0      0

EDIT: In order to get your out put

pd.crosstab(df.Group,df.Size).replace(0,np.nan).\
     stack().reset_index().rename(columns={0:'Time'})
Out[591]: 
      Group    Size  Time
0  Moderate  Medium   1.0
1  Moderate   Small   1.0
2     Short   Small   2.0
3      Tall   Large   1.0

edited Oct 14 '20 at 13:15

answered May 05 '17 at 21:39

BENY

296,997
19
147
204

7

nice. you can even add `margins=True` to get the marginal counts! – Matt Hancock Jun 27 '17 at 19:43
Also df.value_counts(["Group", "Size"]).reset_index() will turn it into a dataframe – Joe Rivera Mar 01 '22 at 20:23
As you count all columns, you can use `df.value_counts()`. – Mykola Zotko May 26 '22 at 18:05

score 3 · Answer 3 · answered Aug 06 '20 at 17:03

3

Other posibbility is using .pivot_table() and aggfunc='size'

df_solution = df.pivot_table(index=['Group','Size'], aggfunc='size')

answered Aug 06 '20 at 17:03

asantz96

579
3
15

Python: get a frequency count based on two columns (variables) in pandas dataframe some row appers

3 Answers3

Linked

Related