0

I have df like this: df = pd.DataFrame({"Year":[2000, 2000, 2000, 2001, 2001, 2001], "Name": ["Alice", "Ana", "Tom", "John", "Frank", "Alice"], "Count":[20, 500, 1000, 30, 50, 66]})

and how can I calculate how many children were born in each year ? for instance according to data frame above in 2000 year we had 20+500+1000 means 1520 new children.

Magofoco
  • 4,461
  • 5
  • 28
  • 61
dingaro
  • 177
  • 8

3 Answers3

2

You can try:

my_final = df.groupby("Year")["Count"].sum()

print(my_final)

This will calculate the number of children per year.

Magofoco
  • 4,461
  • 5
  • 28
  • 61
  • Perfect, thank you, and how can I check in which year was the biggest numer of new children ? – dingaro Nov 23 '19 at 17:05
  • `df.groupby("Year")["Count"].sum().sort_values(ascending=False)` the top result is the one with the largest number. – Magofoco Nov 23 '19 at 17:08
  • Perfect, and the last question If you have a second, how can I show on the plot for instance top 5 years with the biggest numer of new children using for instance bar graph ? – dingaro Nov 23 '19 at 17:21
  • It is best first if you try by yourself. Check: matplotlib – Magofoco Nov 23 '19 at 17:22
0

This will calculate the new children in the year 2000:

df[df["Year"]==2000]["Count"].sum()
Josua
  • 39
  • 1
  • 5
  • 1
    This only computes one year instead of each year group. On top of that, using chained indexing is a poor choice. – cs95 Nov 23 '19 at 18:26
0

To get the highest number of children try following(as per OP's comments in comment section of 1 of the answers). This will give year in which highest number of children were born.

df.groupby('Year').agg({'Count': 'sum'}).reset_index().max()
RavinderSingh13
  • 117,272
  • 11
  • 49
  • 86