2

How do I organise this triple column data-set by removing the repeting elements.

Country       Year      Temperature
US            1990       25
US            1990       27 
US            1990       24
US            1991       26
Canada        1990       20
 .             .          .

Into

Country      Year        AvgTemp
US           1990           25.33
US            1991          26
Canada       1990           20

I can use groupby to do so for just the 'Year' and 'Temp' columns. But what if 3 columns are involved.

(P.S. I am new to pandas )

  • 1
    This is just: `df.groupby(['Country', 'Year'])['Temperature'].mean()` – Erfan Jun 14 '20 at 16:32
  • To match your expected output with the new column name, use named aggregations instead: `df.groupby(['Country', 'Year']).agg(AvgTemp=('Temperature', 'mean')).reset_index()` – Erfan Jun 14 '20 at 16:35

2 Answers2

1

You can use multiple variables inside groupby() like this

df.groupby(['Country','Year'])['Temp'].mean().reset_index()
Ch3steR
  • 19,076
  • 4
  • 25
  • 52
DataVizPyR
  • 51
  • 5
1
df.groupby(['Country', 'Year']).mean().reset_index().rename(columns={'Temperature':'AvgTemp'})
warped
  • 8,032
  • 3
  • 21
  • 43