0

I am trying to perform a groupby on a df and not have the other columns values shuffle. I have simplified my df for convenience. For the sub and study days, I would like to see the best test grade.

import pandas as pd
grades = pd.DataFrame({"sub": ["math", "math", "writing", "math", "writing", "reading", "reading", "reading"],
                       "study_days": [120, 120, 90, 90, 120, 60, 60, 30], "test": ["a1", "b2", "c3", "d4", "e5", "f6", "g7", "h8"],
                        "grade": [100,99,90,90,88,85,84,80], "a": [1,2,3,4,5,6,7,8]})
#this is my grades df   
    sub study_days  test    grade   a
0   math    120      a1      100    1
1   math    120      b2       99    2
2   writing 90       c3       90    3
3   math    90       d4       90    4
4   writing 120      e5       88    5
5   reading 60       f6       85    6
6   reading 60       g7       84    7
7   reading 30       h8       80    8

#this is my desired outcome
desired = grades.iloc[[0,2,3,4,5,7],:]

    sub study_days      test    grade   a
0   math    120          a1       100   1
2   writing 90           c3        90   3
3   math    90           d4        90   4
4   writing 120          e5        88   5
5   reading 60           f6        85   6
7   reading 30           h8        80   8

#here is my attempt at a groupby
df = grades.groupby(by=['sub', 'study_days']).max("grade")

#here is the outcome
               grade    a
sub study_days      
math    90       90     4
        120     100     2
reading 30       80     8
        60       85     7
writing 90       90     3
        120      88     5

As you can see, my groupby has shifted some of the values in the 'a' column. Thank you for any help.

joe
  • 1
  • 1

0 Answers0