93

I have data similar to this:

dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))

I want to select rows from this data frame based on the values in the fct variable. For example, if I wish to select rows containing either "a" or "c" I can do this:

dt[dt$fct == 'a' | dt$fct == 'c', ]

which yields

1    a 2
3    c 3
5    c 5
7    a 7
9    c 9
10   a 1
12   c 2
14   c 4

as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as

vc <- c('a', 'c')

So I tried

dt[dt$fct == vc, ]

but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way.

So how can I filter/subset my data based on the contents of the vector vc?

Henrik
  • 61,039
  • 13
  • 131
  • 152
Joe King
  • 2,785
  • 5
  • 27
  • 42

3 Answers3

158

Have a look at ?"%in%".

dt[dt$fct %in% vc,]
   fct X
1    a 2
3    c 3
5    c 5
7    a 7
9    c 9
10   a 1
12   c 2
14   c 4

You could also use ?is.element:

dt[is.element(dt$fct, vc),]
johannes
  • 13,431
  • 5
  • 38
  • 50
36

Similar to above, using filter from dplyr:

filter(df, fct %in% vc)
Andrew Haynes
  • 2,514
  • 2
  • 22
  • 33
15

Another option would be to use a keyed data.table:

library(data.table)
setDT(dt, key = 'fct')[J(vc)]  # or: setDT(dt, key = 'fct')[.(vc)]

which results in:

   fct X
1:   a 2
2:   a 7
3:   a 1
4:   c 3
5:   c 5
6:   c 9
7:   c 2
8:   c 4

What this does:

  • setDT(dt, key = 'fct') transforms the data.frame to a data.table (which is an enhanced form of a data.frame) with the fct column set as key.
  • Next you can just subset with the vc vector with [J(vc)].

NOTE: when the key is a factor/character variable, you can also use setDT(dt, key = 'fct')[vc] but that won't work when vc is a numeric vector. When vc is a numeric vector and is not wrapped in J() or .(), vc will work as a rowindex.

A more detailed explanation of the concept of keys and subsetting can be found in the vignette Keys and fast binary search based subset.

An alternative as suggested by @Frank in the comments:

setDT(dt)[J(vc), on=.(fct)]

When vc contains values that are not present in dt, you'll need to add nomatch = 0:

setDT(dt, key = 'fct')[J(vc), nomatch = 0]

or:

setDT(dt)[J(vc), on=.(fct), nomatch = 0]
David Arenburg
  • 89,637
  • 17
  • 130
  • 188
Jaap
  • 77,147
  • 31
  • 174
  • 185
  • I coudn't get it work when the vector and the variable in data.table are numeric. Any ideas? – Gaurav Singhal May 10 '17 at 06:41
  • @GauravSinghal updated the answer, the method in the previous version worked on for character/factor columns; the updated method also works for integer/numeric columns – Jaap May 10 '17 at 07:24