93

Are there single functions in pandas to perform the equivalents of SUMIF, which sums over a specific condition and COUNTIF, which counts values of specific conditions from Excel?

I know that there are many multiple step functions that can be used for

for example for sumif I can use (df.map(lambda x: condition), or df.size()) then use .sum()

and for countif I can use (groupby functions and look for my answer or use a filter and the .count())

Is there simple one step process to do these functions where you enter the condition and the data frame and you get the sum or counted results?

petezurich
  • 7,683
  • 8
  • 34
  • 51
user3084006
  • 4,884
  • 7
  • 31
  • 41

3 Answers3

120

You can first make a conditional selection, and sum up the results of the selection using the sum function.

>> df = pd.DataFrame({'a': [1, 2, 3]})
>> df[df.a > 1].sum()   
a    5
dtype: int64

Having more than one condition:

>> df[(df.a > 1) & (df.a < 3)].sum()
a    2
dtype: int64

If you want to do COUNTIF, just replace sum() with count()

Liora Haydont
  • 1,233
  • 1
  • 9
  • 25
Jimmy C
  • 8,392
  • 10
  • 39
  • 60
  • 5
    What would you do if you have two or more different columns and you want more than one condition? – user3084006 Jan 08 '14 at 12:23
  • Just change one of the selected columns in the second example to another column name. – Jimmy C Jan 08 '14 at 12:25
  • For multiple columns the count of each unique value use `df.aggregate(['value_counts])`. Ideal when these columns have the same values. You can also use only select columns `df[list_of_columns].aggregate(['value_counts])`. – raummensch Mar 15 '21 at 02:52
51

You didn't mention the fancy indexing capabilities of dataframes, e.g.:

>>> df = pd.DataFrame({"class":[1,1,1,2,2], "value":[1,2,3,4,5]})
>>> df[df["class"]==1].sum()
class    3
value    6
dtype: int64
>>> df[df["class"]==1].sum()["value"]
6
>>> df[df["class"]==1].count()["value"]
3

You could replace df["class"]==1by another condition.

Serenity
  • 32,301
  • 18
  • 107
  • 109
Thorsten Kranz
  • 11,837
  • 1
  • 36
  • 53
  • I have this in my code too, but what if you have multiple conditions Like if I wanted `df[df["class"]==1].count()["value"]` and `df[df["value"]==2].count()["class"]` – user3084006 Jan 08 '14 at 12:14
  • 1
    Combination of more than one condition was proposed by Jimmy C, so I won't repeat it in my post. Is there anything else missing for you? – Thorsten Kranz Jan 08 '14 at 14:53
  • 4
    An easier way to get the count would be `len(df[df["class"]==1])` – beldaz May 14 '18 at 05:42
13

I usually use numpy sum over the logical condition column:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({'Age' : [20,24,18,5,78]})
>>> np.sum(df['Age'] > 20)
2

This seems to me slightly shorter than the solution presented above

dan12345
  • 1,505
  • 4
  • 20
  • 28
  • Actually, one does not have to use numpy here. You can just `sum(df['Age'] > 20)`. The argument is iterable and can be picked up by the built-in function. – drsealks Jan 19 '21 at 08:34