1

Let's say I have

> df
   ID  tti
1 118 11.2
2 118  1.4
3 118  9.2
4   9  2.7
5  12  1.2
6  12 82.8

There are unique and duplicated values in df$ID. If a value in df$ID is duplicated, I want to keep the row with the lowest df$tti value.

Expected output

> df
   ID  tti
1 118  1.4
2   9  2.7
3  12  1.2

I am looking for a solution in dplyr

Data

df <- structure(list(ID = c(118L, 118L, 118L, 9L, 12L, 12L), tti = c(11.2, 
1.4, 9.2, 2.7, 1.2, 82.8)), class = "data.frame", row.names = c(NA, 
-6L)) 
cmirian
  • 2,218
  • 2
  • 14
  • 40

2 Answers2

1

We can use slice_min after grouping by 'ID'

library(dplyr)
df %>%
    group_by(ID) %>%
    slice_min(tti) %>%
    ungroup

-output

# A tibble: 3 x 2
#     ID   tti
#  <int> <dbl>
#1     9   2.7
#2    12   1.2
#3   118   1.4

Or with collapse

library(collapse)
df %>%
    fgroup_by(ID) %>%
    fsummarise(tti = fmin(tti))

#   ID tti
#1   9 2.7
#2  12 1.2
#3 118 1.4

Or another option is roworder (which is faster than arrange from dplyr) with funique

roworder(df, ID, tti) %>%
     funique(cols = 1)
#    ID tti
#1   9 2.7
#2  12 1.2
#3 118 1.4
akrun
  • 789,025
  • 32
  • 460
  • 575
1

By default distinct keeps the first row of duplicated ID. So if the data are sorted first by tti within ID then the first row will also be the smallest value of tti. By default arrange orders data in ascending order.

library(dplyr)
df %>% 
  dplyr::arrange(ID, tti) %>% 
  dplyr::distinct(ID, .keep_all = T)

Output

   ID tti
1   9 2.7
2  12 1.2
3 118 1.4
LMc
  • 8,572
  • 3
  • 21
  • 35