0

I'm trying to remove all duplicate values based on multiple variable using dplyr. Here's how I do it without dplyr:

dat = data.frame(id=c(1,1,2),date=c(1,1,1))
dat = dat[!(duplicated(dat[c('id','date')]) | duplicated(dat[c('id','date')],fromLast=TRUE)),]

It should only return id number 2.

NelsonGon
  • 12,469
  • 5
  • 25
  • 52
spazznolo
  • 705
  • 3
  • 9

1 Answers1

3

This can be done with a group_by/filter operation in tidyverse. Grouped by the columns of interest (here used group_by_all as all the columns in the dataset are grouped. Instead can also make use of group_by_at if a selected number of columns are needed)

library(dplyr)
dat %>% 
   group_by_all() %>%
   filter(n()==1)

Or simply group_by

dat %>% 
   group_by(id, date) %>%
   filter(n() == 1)

If the OP intended to use the duplicated function

dat %>%
  filter_at(vars(id, date),
        any_vars(!(duplicated(.)|duplicated(., fromLast = TRUE))))
# id date
#1  2    1
akrun
  • 789,025
  • 32
  • 460
  • 575