I am trying to perform a groupby on a df and not have the other columns values shuffle. I have simplified my df for convenience. For the sub and study days, I would like to see the best test grade.
import pandas as pd
grades = pd.DataFrame({"sub": ["math", "math", "writing", "math", "writing", "reading", "reading", "reading"],
"study_days": [120, 120, 90, 90, 120, 60, 60, 30], "test": ["a1", "b2", "c3", "d4", "e5", "f6", "g7", "h8"],
"grade": [100,99,90,90,88,85,84,80], "a": [1,2,3,4,5,6,7,8]})
#this is my grades df
sub study_days test grade a
0 math 120 a1 100 1
1 math 120 b2 99 2
2 writing 90 c3 90 3
3 math 90 d4 90 4
4 writing 120 e5 88 5
5 reading 60 f6 85 6
6 reading 60 g7 84 7
7 reading 30 h8 80 8
#this is my desired outcome
desired = grades.iloc[[0,2,3,4,5,7],:]
sub study_days test grade a
0 math 120 a1 100 1
2 writing 90 c3 90 3
3 math 90 d4 90 4
4 writing 120 e5 88 5
5 reading 60 f6 85 6
7 reading 30 h8 80 8
#here is my attempt at a groupby
df = grades.groupby(by=['sub', 'study_days']).max("grade")
#here is the outcome
grade a
sub study_days
math 90 90 4
120 100 2
reading 30 80 8
60 85 7
writing 90 90 3
120 88 5
As you can see, my groupby has shifted some of the values in the 'a' column. Thank you for any help.