In R, retrieve the n most frequent items from several categories of item

Question

Background

I'm using dplyr to do some data analysis. I've got this dataframe d:

d <- data.frame(ID = c("ab24","c56","kl90","ab24","c56","kl90","ab24","c56","kl90","ab24","c56","kl90","ab24","c56","kl90"),
                category = c("drivetrain","drivetrain","drivetrain","drivetrain","drivetrain","drivetrain","drivetrain","drivetrain","engine","engine","engine","engine","engine","engine","engine"),
                part = c("gear oil","gear oil","gear oil","cv axle","cv axle","transfer case fluid","cv intermediate shaft","differential seal","oil","exhaust valve","oil","oil","exhaust valve","piston","oil pan"),
                stringsAsFactors=FALSE)

As you can see, it's about car parts: it's got some customer IDs, some part categorys, and some individual part names.

The Problem

I'd like to get a dataframe (or any table) that displays the top n -- let's say 2 -- most frequent individual parts per part category, and arrange them in descending order within part category. I have in mind something like this:

| Category   | Part            |    n      |
|------------|-----------------|-----------|
| drivetrain | gear oil        |    3      |
| drivetrain | cv axle         |    2      |   
| engine     | oil             |    3      |
| engine     | exhaust valve   |    2      |

What I've tried

I can get something going with group_by and count, like so:

d %>%
  group_by(category, part) %>%
  count(part) %>%
  as.data.frame() %>%
  na.omit 

    category                part   n
drivetrain               cv axle   2
drivetrain cv intermediate shaft   1
drivetrain     differential seal   1
drivetrain              gear oil   3
drivetrain   transfer case fluid   1
    engine         exhaust valve   2
    engine                   oil   3
    engine               oil pan   1
    engine                piston   1

But when I try throwing in a variant of slice in the code to get the 2 most frequent parts per category, I get errors like Error in slice_max(): ! argument order_by is missing, with no default. Not sure where to throw in order_by.

`d %>% group_by(category, part) %>% tally() %>% slice_max(n, n = 2)` — Ritchie Sacramento, May 27 '22 at 04:09
Yeah this works great -- I didn't really know about `tally`. Replacing that with `count` changes the whole game; glad I know about it now. Write that up as answer if you like and I'll mark it answered. Thanks! — logjammin, May 27 '22 at 04:18
@onyambu - it's not exactly the same - `tally()` strips off the last group and `count()` leaves the data fully grouped. — Ritchie Sacramento, May 27 '22 at 04:21

In R, retrieve the n most frequent items from several categories of item

0 Answers0