124

Hello I have the following dataframe.

    Group           Size

    Short          Small
    Short          Small
    Moderate       Medium
    Moderate       Small
    Tall           Large

I want to count the frequency of how many time the same row appears in the dataframe.

    Group           Size      Time

    Short          Small        2
    Moderate       Medium       1 
    Moderate       Small        1
    Tall           Large        1
Community
  • 1
  • 1
emax
  • 5,795
  • 11
  • 55
  • 115
  • 1
    Note on performance, including alternatives: [Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series](https://stackoverflow.com/questions/50328246/pandas-groupby-size-vs-series-value-counts-vs-collections-counter-with-multiple) – jpp Jun 25 '18 at 14:02

3 Answers3

179

You can use groupby's size:

In [11]: df.groupby(["Group", "Size"]).size()
Out[11]:
Group     Size
Moderate  Medium    1
          Small     1
Short     Small     2
Tall      Large     1
dtype: int64

In [12]: df.groupby(["Group", "Size"]).size().reset_index(name="Time")
Out[12]:
      Group    Size  Time
0  Moderate  Medium     1
1  Moderate   Small     1
2     Short   Small     2
3      Tall   Large     1
Andy Hayden
  • 328,850
  • 93
  • 598
  • 514
  • 7
    Thanks. One minor addition to pick the top k (=20) values based on the frequency ("Time"): df.groupby(["Group", "Size"]).size().reset_index(name="Time").sort_values(by='Time',ascending=False).head(20); – Dileep Kumar Patchigolla Dec 18 '17 at 12:40
  • 1
    Just notice that using `.size()` will returns Series while `.size().reset_index(name="Time")` is a DataFrame. Thanks Andy. – alemol Nov 23 '19 at 17:14
  • or you could also do `df.groupby(by=["Group", "Size"], as_index=False).size()` simply – Naveen Reddy Marthala Apr 25 '20 at 04:18
79

Update after pandas 1.1 value_counts now accept multiple columns

df.value_counts(["Group", "Size"])

You can also try pd.crosstab()

Group           Size

Short          Small
Short          Small
Moderate       Medium
Moderate       Small
Tall           Large

pd.crosstab(df.Group,df.Size)


Size      Large  Medium  Small
Group                         
Moderate      0       1      1
Short         0       0      2
Tall          1       0      0

EDIT: In order to get your out put

pd.crosstab(df.Group,df.Size).replace(0,np.nan).\
     stack().reset_index().rename(columns={0:'Time'})
Out[591]: 
      Group    Size  Time
0  Moderate  Medium   1.0
1  Moderate   Small   1.0
2     Short   Small   2.0
3      Tall   Large   1.0
BENY
  • 296,997
  • 19
  • 147
  • 204
3

Other posibbility is using .pivot_table() and aggfunc='size'

df_solution = df.pivot_table(index=['Group','Size'], aggfunc='size')
asantz96
  • 579
  • 3
  • 15