2

I have a dataframe (call it df) of accidents. Each accident has a # associated with it, a # for each person involved, and the type of accident. It looks something like this:

x               y                    z
accident #1   person A    accident type #1
accident #1   person A    accident type #2
accident #2   person A    accident type #1
accident #2   person B    accident type #2
accident #2   person B    accident type #3
accident #3   person C    accident type #1

In the above case, person A was involved in two accidents. In the first accident, there were two 'types' of accidents that person A was involved with. Person B was involved with person A, but was only involved in one accident, with two accident types. Person C was also involved in only one accident.

I want to collect the subset of people who have only been involved in one accident. However, I want to include all of their accident types. So using the above example, I would want this:

x               y                    z
accident #2   person #2    accident type #2
accident #2   person #2    accident type #3
accident #3   person #3    accident type #1

How might I do this in R?

Jaap
  • 77,147
  • 31
  • 174
  • 185

2 Answers2

3

You can do this with the dplyr package, using group_by, filter, and n_distinct:

library(dplyr)
df %>%
  group_by(y) %>%
  filter(n_distinct(x) == 1) %>%
  ungroup()
David Robinson
  • 74,512
  • 15
  • 159
  • 179
0

We can use data.table

library(data.table)
setcolorder(setDT(df)[, .SD[uniqueN(x)==1] , y], names(df))[]
#            x        y                z
#1: accident #2 person B accident type #2
#2: accident #2 person B accident type #3
#3: accident #3 person C accident type #1
akrun
  • 789,025
  • 32
  • 460
  • 575