0

I have a pandas dataframe that has rows like this

   Same1  Same2  Diff3  Encoded1  Encoded2  Encoded3
0     33     22    150         0         0         0
1     33     22    300         1         0         1

What I want to achieve is to combine all rows where the 'Same1' and 'Same2' variables are the same, by adding up the other variables.

   Same1  Same2  Diff3  Encoded1  Encoded2  Encoded3
0     33     22    450         1         0         1

What would be the cleanest way to achieve this using pandas?

Executable python code: https://trinket.io/python3/1da371fd04

mre
  • 127
  • 9

2 Answers2

2

You can try

out = df.groupby(['Same1', 'Same2']).agg(sum).reset_index()
print(out)

   Same1  Same2  Diff3  Encoded1  Encoded2  Encoded3
0     33     22    450         1         0         1
Ynjxsjmh
  • 16,448
  • 3
  • 17
  • 42
1

You can use a groupby to get the expected result :

df.groupby(['Same1', 'Same2'], as_index=False).sum()

Output :

    Same1   Same2   Diff3   Encoded1    Encoded2    Encoded3
0   33      22      450     1           0           1
tlentali
  • 3,250
  • 2
  • 11
  • 20