0

I have a Dataframe and would like to drop certain rows for each category. Here is the data:

data={'GROUP':['A','A','A','B','B','B','B','C','C','C','C','C'],'DATE':['202101','202102','202103','201907','201908','201909',
'201910','202003','202004','202005','202006','202007']}
df=pd.DataFrame(data, columns=['GROUP','DATE']) 
         
   GROUP    DATE
0      A  202101
1      A  202102
2      A  202103
3      B  201907
4      B  201908
5      B  201909
6      B  201910
7      C  202003
8      C  202004
9      C  202005
10     C  202006
11     C  202007

I would like to drop all the rows after the second date per group. In other words I would like to produce something to this effect:

  GROUP    DATE
0     A  202101
1     A  202102
3     B  201907
4     B  201908
7     C  202003
8     C  202004
mozway
  • 81,317
  • 8
  • 19
  • 49
wild west
  • 15
  • 3

1 Answers1

1

Use GroupBy.head:

df.groupby('GROUP').head(2)

OUTPUT

  GROUP    DATE
0     A  202101
1     A  202102
3     B  201907
4     B  201908
7     C  202003
8     C  202004
ThePyGuy
  • 13,387
  • 4
  • 15
  • 42
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090