subset data.table based on separate data.table

Question

I have a primary table

library(data.table); set.seed(42)
D1 <- data.table(id=rep(1:3,each=2), name=rep(c("a","b")), val=runif(6,0,1))
> D1
   id name       val
1:  1    a 0.9148060
2:  1    b 0.9370754
3:  2    a 0.2861395
4:  2    b 0.8304476
5:  3    a 0.6417455
6:  3    b 0.5190959

giving two values for each id, named a and b. The value I want to select is determined in a secondary table

D2 <- data.table(id=1:3, name=c("a","a","b"))

and I would like to subset D1 using D2. I could do this ID by ID in a loop but given the dimensions of my data I hope that there is a more efficient solution, perhaps using a join.

akrun · Accepted Answer · 2019-12-25T16:54:16.413

0

We can use a join and it would be faster as we are already having data.table objects

D1[D2, on = .(id, name)]
#  id name       val
#1:  1    a 0.9148060
#2:  2    a 0.2861395
#3:  3    b 0.5190959

Or with inner_join from dplyr

library(dplyr)
inner_join(D1, D2)

Or using match in base R

D2$val <- D1$val[match(paste(D2$id, D2$name), paste(D1$id, D1$name))]

edited Dec 25 '19 at 16:54

answered Dec 25 '19 at 15:56

akrun

789,025
32
460
575

score 0 · Answer 2 · answered Dec 25 '19 at 16:52

0

A base R solution is via using merge(), i.e.,

> merge(D2,D1)
   id name       val
1:  1    a 0.9148060
2:  2    a 0.2861395
3:  3    b 0.5190959

answered Dec 25 '19 at 16:52

ThomasIsCoding

80,151
7
17
65

subset data.table based on separate data.table

2 Answers2