0

I have a dataset that have binary values for flags for each sinid like this:

>>> df = pd.DataFrame({'sinid':['abc','def','ghi','abc','ghi'],'flag1':[1,1,0,0,1],'flag2':[1,0,1,0,0]})
>>> df
  sinid  flag1  flag2
0   abc      1      1
1   def      1      0
2   ghi      0      1
3   abc      0      0
4   ghi      1      0

I want to add values for each sinid, I think I need groupby but not sure how to use it...

This is the expected result:

  sinid  flag1  flag2
0   abc      1      1
1   def      1      0
2   ghi      1      1
Soufiane Sabiri
  • 618
  • 1
  • 4
  • 16

3 Answers3

1

Group by then do a sum and reset the index.

df = df.groupby(['sinid']).sum().reset_index()
df

Result:

  sinid flag1   flag2
0   abc  1      1
1   def  1      0
2   ghi  1      1
jose_bacoy
  • 8,661
  • 1
  • 20
  • 35
0

Just summarize grouped dataframe:

df.groupby('sinid').sum()

    flag1   flag2
sinid       
abc     1   1
def     1   0
ghi     1   1
vurmux
  • 8,742
  • 3
  • 21
  • 41
0

This works:

df.groupby(['sinid'])['flag1', 'flag2'].sum().reset_index()

  sinid  flag1  flag2
0   abc      1      1
1   def      1      0
2   ghi      1      1
Adarsh Chavakula
  • 1,499
  • 18
  • 26