In case of duplicated value in variable, keep row with lowest value based on other variable

Question

Let's say I have

There are unique and duplicated values in df$ID. If a value in df$ID is duplicated, I want to keep the row with the lowest df$tti value.

Expected output

> df
   ID  tti
1 118  1.4
2   9  2.7
3  12  1.2

I am looking for a solution in dplyr

Data

df <- structure(list(ID = c(118L, 118L, 118L, 9L, 12L, 12L), tti = c(11.2, 
1.4, 9.2, 2.7, 1.2, 82.8)), class = "data.frame", row.names = c(NA, 
-6L))

akrun · Accepted Answer · 2021-03-16T22:50:57.280

We can use slice_min after grouping by 'ID'

library(dplyr)
df %>%
    group_by(ID) %>%
    slice_min(tti) %>%
    ungroup

-output

# A tibble: 3 x 2
#     ID   tti
#  <int> <dbl>
#1     9   2.7
#2    12   1.2
#3   118   1.4

Or with collapse

library(collapse)
df %>%
    fgroup_by(ID) %>%
    fsummarise(tti = fmin(tti))

#   ID tti
#1   9 2.7
#2  12 1.2
#3 118 1.4

Or another option is roworder (which is faster than arrange from dplyr) with funique

roworder(df, ID, tti) %>%
     funique(cols = 1)
#    ID tti
#1   9 2.7
#2  12 1.2
#3 118 1.4

score 1 · Answer 2 · answered Mar 16 '21 at 20:41

By default distinct keeps the first row of duplicated ID. So if the data are sorted first by tti within ID then the first row will also be the smallest value of tti. By default arrange orders data in ascending order.

library(dplyr)
df %>% 
  dplyr::arrange(ID, tti) %>% 
  dplyr::distinct(ID, .keep_all = T)

Output

   ID tti
1   9 2.7
2  12 1.2
3 118 1.4

In case of duplicated value in variable, keep row with lowest value based on other variable

2 Answers2