Removing rows from a dataframe based on a conditional statement?

Question

I have a dataframe (call it df) of accidents. Each accident has a # associated with it, a # for each person involved, and the type of accident. It looks something like this:

x               y                    z
accident #1   person A    accident type #1
accident #1   person A    accident type #2
accident #2   person A    accident type #1
accident #2   person B    accident type #2
accident #2   person B    accident type #3
accident #3   person C    accident type #1

In the above case, person A was involved in two accidents. In the first accident, there were two 'types' of accidents that person A was involved with. Person B was involved with person A, but was only involved in one accident, with two accident types. Person C was also involved in only one accident.

I want to collect the subset of people who have only been involved in one accident. However, I want to include all of their accident types. So using the above example, I would want this:

x               y                    z
accident #2   person #2    accident type #2
accident #2   person #2    accident type #3
accident #3   person #3    accident type #1

How might I do this in R?

Indent four spaces to make code blocks, or highlight and press CTRL+K — Frank, May 01 '17 at 17:42
@MichaelChirico I'm new to R and am unsure what exactly to Google. Nothing I've found matches my specific case. — Sydney Maples, May 01 '17 at 17:50
Then explicate what you've found, and why it doesn't apply -- demonstrating effort goes a long way — MichaelChirico, May 01 '17 at 17:51
http://stackoverflow.com/questions/17421776/how-to-add-count-of-unique-values-by-group-to-r-data-frame — jogo, May 01 '17 at 18:00

score 3 · Accepted Answer · answered May 01 '17 at 17:50

3

You can do this with the dplyr package, using group_by, filter, and n_distinct:

library(dplyr)
df %>%
  group_by(y) %>%
  filter(n_distinct(x) == 1) %>%
  ungroup()

answered May 01 '17 at 17:50

David Robinson

74,512
15
159
179

score 0 · Answer 2 · answered May 02 '17 at 03:11

We can use data.table

library(data.table)
setcolorder(setDT(df)[, .SD[uniqueN(x)==1] , y], names(df))[]
#            x        y                z
#1: accident #2 person B accident type #2
#2: accident #2 person B accident type #3
#3: accident #3 person C accident type #1

Removing rows from a dataframe based on a conditional statement?

2 Answers2