0

I have a data frame representing IMDb ratings of a selection of tv shows with the following columns:

date, ep_no, episode, show_title, season, rating

I need to select the lowest rated episode of each show, but I am having trouble displaying all of the columns I want.

I can successfully select the correct data using:

df.groupby('show_title')['rating'].min()

But this only displays the show title and the rating of the lowest rated episode for that show.

I need it to display: show_title, ep_no, episode, rating

I have tried various tweaks to the code, from the simple to the complex, but I guess I'm just not experienced enough to crack this particular puzzle right now.

Any ideas?

3 Answers3

1

If I understand what you want, this question is similar to this question; And the following code should do the trick.

df[df.groupby('show_title')['rating'].transform(min) == df['rating']]
Ryan Sandridge
  • 1,819
  • 18
  • 29
0

One approach is to sort the DataFrame by rating, then dropping duplicates of show while keeping the first occurrence of each show:

df.sort_values(by='rating').drop_duplicates(['show_title'], keep='first')
Peter Leimbigler
  • 9,860
  • 1
  • 20
  • 33
0
# It's easy just do a sort by show_title , rating before using groupby

df.sort_values(by=['show_title','rating'],inplace=True)

# Now use groupby and return the first instance of every group by object
# first row will automatically contain the minimum rating
df1 = df.groupby('show_title').first()
Abhishek Sharma
  • 1,699
  • 1
  • 13
  • 24