1

I have the foll. pandas dataframe with datetime index:

datetime       VAL
2000-01-01   -283.0000
2000-01-02   -283.0000
2000-01-03    -10.6710
2000-01-04    -12.2700
2000-01-05    -10.7855
2001-01-06     -9.1480
2001-01-07     -9.5300
2001-01-08    -10.4675
2001-01-09    -10.9205
2001-01-10    -11.5715

I would like to compute cumulative values for each year and replace the VAL column by the cumulative values. E.g, It will look something like this:

datetime       VAL
2000-01-01   -283.0000
2000-01-02   -283.0000 + -283.0000
2000-01-03    -10.6710 + -283.0000 + -283.0000
2000-01-04    -12.2700 + -10.6710 + -283.0000 + -283.0000
2000-01-05    -10.7855 + -12.2700 + -10.6710 + -283.0000 + -283.0000
2001-01-06     -9.1480
2001-01-07     -9.5300 + -9.5300
2001-01-08    -10.4675 + -10.4675
2001-01-09    -10.9205 + -10.9205
2001-01-10    -11.5715 + -11.5715

I haven't done the actual calculations which is why you see -283.000 + -283.000 instead of -566.0000

Not sure how to proceed with this, I could do a groupby and then?

user308827
  • 21,018
  • 70
  • 229
  • 377

2 Answers2

4

You can access the year via .year on a DateTimeIndex, and pass that to groupby:

>>> df["cumulative_VAL"] = df.groupby(df.index.year)["VAL"].cumsum()
>>> df
                 VAL  cumulative_VAL
datetime                            
2000-01-01 -283.0000       -283.0000
2000-01-02 -283.0000       -566.0000
2000-01-03  -10.6710       -576.6710
2000-01-04  -12.2700       -588.9410
2000-01-05  -10.7855       -599.7265
2001-01-06   -9.1480         -9.1480
2001-01-07   -9.5300        -18.6780
2001-01-08  -10.4675        -29.1455
2001-01-09  -10.9205        -40.0660
2001-01-10  -11.5715        -51.6375
DSM
  • 319,184
  • 61
  • 566
  • 472
  • thanks @DSM, is there a difference between doing `df.groupby(df.index.year)["VAL"].cumsum()` and `df['VAL'].groupby(df.index.year).cumsum()`? – user308827 Jan 27 '16 at 05:46
  • 1
    @user308827: not here, or least not that I can think of offhand. In both cases you've selected the column VAL before you apply any operations. – DSM Jan 27 '16 at 05:50
1

Use numpy.cumsum()

>>> a = np.array([[1,2,3], [4,5,6]])
>>> a array([[1, 2, 3],
       [4, 5, 6]])
>>> np.cumsum(a) array([ 1,  3,  6, 10, 15, 21])
>>> np.cumsum(a, dtype=float)     # specifies type of output value(s) array([  1.,   3.,   6.,  10.,  15.,  21.])

http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.cumsum.html

To groupby year, you can use:

data.groupby(data['datetime'].map(lambda x: x.year))

How to group pandas DataFrame entries by date in a non-unique column

Community
  • 1
  • 1
drum
  • 4,946
  • 7
  • 55
  • 86