13

Let's say we have the following pandas DataFrame:

In [1]:
import pandas as pd
import numpy as np

df = pd.DataFrame([0, 1, 0, 0, 1, 1, 0, 1, 1, 1], columns=['in'])
df
Out[1]: 
   in
0   0
1   1
2   0
3   0
4   1
5   1
6   0
7   1
8   1
9   1

How to count the number of consecutive ones in a vectorized way in pandas? I would like to have a result like this:

   in  out
0   0    0
1   1    1
2   0    0
3   0    0
4   1    1
5   1    2
6   0    0
7   1    1
8   1    2
9   1    3

Something like a vectorized cumsum operation that resets on a specific condition.

Puggie
  • 3,707
  • 2
  • 29
  • 39

1 Answers1

15

You can do something like this(credit goes to: how to emulate itertools.groupby with a series/dataframe?):

>>> df['in'].groupby((df['in'] != df['in'].shift()).cumsum()).cumsum()
0    0
1    1
2    0
3    0
4    1
5    2
6    0
7    1
8    2
9    3
dtype: int64
Ashwini Chaudhary
  • 232,417
  • 55
  • 437
  • 487
  • 1
    Fair play. Been banging my head against this for 40 minutes :P – Ffisegydd Mar 19 '15 at 11:26
  • @Ffisegydd Same here :-), but I had a feeling someone would have asked for `itertools.groupby` equivalent groupy in Pandas somewhere. – Ashwini Chaudhary Mar 19 '15 at 11:28
  • Yeah I knew I wanted it, just never bothered to Google it >_ – Ffisegydd Mar 19 '15 at 11:35
  • 1
    @AshwiniChaudhary would you do a pull-request to add this to the cookbook, somewhere in this section: http://pandas.pydata.org/pandas-docs/stable/cookbook.html#grouping – Jeff Mar 19 '15 at 12:00