1

I have a data.frame like data. In the column named value same values appears more than once (in more than one rows). I would like to match the rows that have the same value, in order to find their ids. In other words, I would like to have as a result that ids "P1","P3" and "P4" have the same value wich equals to 24.7386760 and the ids "P2"and "P6" has the same value that equals to 21.9178082.

I have used duplicated function to spot the duplicated values and then filter function to keep the rows with an exact value. I have tried this code:

id <- c("P1", "P2", "P3", "P4", "P5", "P6")
value <- c(24.7386760, 21.9178082, 24.7386760, 24.7386760, 20.7441860, 21.9178082)
data <- as.data.frame(cbind(id,value))

duplicates <- data$value[duplicated(data$value) | duplicated(data$value, fromLast=TRUE)]
View(duplicates)

library(dplyr)
cat1 <- filter(data,data$value == 24.7386760)
cat2 <- filter(data,data$value == 21.9178082)

Even though it can work for a small amount of different values it can not work for a lot of values, like my real problem values.

Any ideas on this? Thank you

2 Answers2

2

Are you looking to group like values?

split(data, data$value)

$`20.744186`
  id     value
5 P5 20.744186

$`21.9178082`
  id      value
2 P2 21.9178082
6 P6 21.9178082

$`24.738676`
  id     value
1 P1 24.738676
3 P3 24.738676
4 P4 24.738676

or maybe you prefer this output:

aggregate(id ~ value, data, paste)

       value         id
1  20.744186         P5
2 21.9178082     P2, P6
3  24.738676 P1, P3, P4

aggregate with no duplicates

aggregate(id ~ value, data[data$value %in% duplicates,], paste)

       value         id
1 21.9178082     P2, P6
2  24.738676 P1, P3, P4
Daniel O
  • 4,205
  • 5
  • 20
1

tidyverse solution without the need to identify duplicates separately:


library(dplyr)
library(stringr)

data %>% 
  group_by(value) %>%
  summarise(ids = paste(id, collapse = ", ")) %>% 
  filter(str_detect(ids, ","))

#> # A tibble: 2 x 2
#>   value      ids       
#>   <chr>      <chr>     
#> 1 21.9178082 P2, P6    
#> 2 24.738676  P1, P3, P4

Created on 2020-07-08 by the reprex package (v0.3.0)

Peter
  • 8,493
  • 4
  • 17
  • 28
  • Thank you very much. Should I install the `tidyverse` package ? – Rea Kalampaliki Jul 08 '20 at 18:11
  • 1
    No you only need the packages listed in the answer. It's just that they are from the `tidyverse` ecosystem in contrast to base r or data.table which seem to be the other general approaches to much r programming. – Peter Jul 08 '20 at 18:14