Extract the unique rows with maximum value in another column in R dataframe

Question

I have this data frame called mydf. There are duplicated samples in the Sample column. I want to extract the unique sample rows with the maximum total_reads and get the result.

mydf<-structure(list(Sample = c("AOGC-02-0188", "AOGC-02-0191", "AOGC-02-0191", 
"AOGC-02-0191", "AOGC-02-0194", "AOGC-02-0194", "AOGC-02-0194"
), total_reads = c(27392583, 19206920, 34462563, 53669483, 24731988, 
43419826, 68151814), Lane = c("4", "5", "4", "4;5", "5", "4", 
"4;5")), .Names = c("Sample", "total_reads", "Lane"), row.names = c("166", 
"169", "170", "171", "173", "174", "175"), class = "data.frame")

result

  Sample        total_reads  Lane
 AOGC-02-0188    27392583    4
 AOGC-02-0191    53669483  4;5
 AOGC-02-0194    68151814  4;5

Possible duplicate of [Aggregate a dataframe on a given column and display another column](http://stackoverflow.com/questions/6289538/aggregate-a-dataframe-on-a-given-column-and-display-another-column) — Bulat, May 22 '16 at 08:15

Sotos · Accepted Answer · 2016-05-22T08:13:31.580

4

You can aggregate and then merge,

merge(aggregate(total_reads ~ Sample, mydf, max), mydf)
#        Sample total_reads Lane
#1 AOGC-02-0188    27392583    4
#2 AOGC-02-0191    53669483  4;5
#3 AOGC-02-0194    68151814  4;5

edited May 22 '16 at 08:13

answered May 22 '16 at 08:02

Sotos

47,396
5
31
61

akrun · Answer 2 · 2016-05-22T10:45:28.503

2

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(mydf)), grouped by "Sample", order the 'total_reads' desendingly and subset the first observation with head.

library(data.table)
setDT(mydf)[order(-total_reads), head(.SD, 1) , Sample]

edited May 22 '16 at 10:45

answered May 22 '16 at 10:37

akrun

789,025
32
460
575

score 1 · Answer 3 · answered May 22 '16 at 08:01

1

Using the dplyr package, you could do that like this:

mydf %>%
    group_by(Sample) %>% # for each unique sample
    arrange(-total_reads) %>% # order by total_reads DESC
    slice(1) # select the first row, i.e. with highest total_reads

answered May 22 '16 at 08:01

Jasper

545
2
12

Extract the unique rows with maximum value in another column in R dataframe

3 Answers3