5

Say I have the following

library(data.table)
cars1 = setDT(copy(cars))
cars2 = setDT(copy(cars))

car_list = list(cars1, cars2)
class(car_list) <- "dd"

`[.dd` <- function(x,...) {
  code = rlang::enquos(...)
  cars1 = x[[1]]
  rlang::eval_tidy(quo(cars1[!!!code]))
}

car_list[,.N, by = speed]

so I wished to perform arbitrary operations on cars1 and cars2 by defining the [.dd function so that whatever I put into ... get executed by cars1 and cars2 using the [ data.table syntax e.g.

car_list[,.N, by = speed] should perform the following

cars1[,.N, by = speed]
cars2[,.N, by = speed]

also I want

car_list[,speed*2]

to do

cars1[,speed*2]
cars2[,speed*2]

Basically, ... in [.dd has to accept arbitrary code.

somehow I need to capture the ... so I tried to do code = rlang::enquos(...) and then rlang::eval_tidy(quo(cars1[!!!code])) doesn't work and gives error

Error in [.data.table(cars1, ~, ~.N, by = ~speed) : argument "i" is missing, with no default

xiaodai
  • 13,509
  • 17
  • 70
  • 120
  • I don't use rlang but the tildes in the error message shouldn't be there. data.table subsetting doesn't expect formulas. – Roland Jul 20 '19 at 08:51
  • @Roland I don't need to use rlang if I don't have to, but i just don't know how to achieve what I want. Hence the question – xiaodai Jul 20 '19 at 08:52
  • I'd just use base eval after substituting the expression into the subset expression ( with base functionality). – Roland Jul 20 '19 at 08:57
  • @Roland I would accept the answer if you just post an example as the answer. Thanks – xiaodai Jul 20 '19 at 08:58
  • Maybe later this weekend if someone else doesn't do it. No R on my phone. – Roland Jul 20 '19 at 08:59
  • 1
    Try using `rlang::enexprs` instead of `enquos`, and remove the call to `quo` inside `eval_tidy`, it's not needed. – Alexis Jul 20 '19 at 09:22
  • 2
    Just to clarify, you *can’t* use tidyeval here, because tidyeval needs to be supported by the callee, and the data.table subsetting operator doesn’t support tidyeval. As a consequence, this is unfortunately a lot more complicated to achieve. – Konrad Rudolph Jul 20 '19 at 12:16
  • related qn: https://stackoverflow.com/questions/9705488/using-data-table-i-and-j-arguments-in-functions – chinsoon12 Jul 22 '19 at 03:31
  • Is there a reason why you would like to do this? Just curious – marbel Jul 22 '19 at 15:22
  • @marble it's for my package disk.frame https://github.com/xiaodaigh/disk.frame – xiaodai Jul 23 '19 at 07:09

3 Answers3

5

While not under rlang type of mantra, this approach seems to work pretty well: lapply(dt_list, '[', ...) The code would be more readable to me as it is explicit about what method is being used. If I saw car_list[, .N, by = speed] I would expect the default data.table methods.

Making it as a function allows you to have the best of both worlds:

class(car_list) <- "dd"

`[.dd` <- function(x,...) {
 lapply(x, '[', ...)
}

car_list[, .N, speed]
car_list[, speed * 2]
car_list[, .(.N, max(dist)), speed]
car_list[, `:=` (more_speed = speed+5)]

Here are some examples of the approach:

car_list[, .N, speed]
# lapply(car_list, '[', j = .N, by = speed)
# or
# lapply(car_list, '[', , .N, speed)
[[1]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
...
[[2]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
...
car_list[, speed * 2]
# lapply(car_list, '[', j = speed*2)
# or
# lapply(car_list, '[', , speed*2)
[[1]]
 [1]  8  8 14 14 16 18 20 20 20 22 22 24 24 24 24 26 26
[18] 26 26 28 28 28 28 30 30 30 32 32 34 34 34 36 36 36
[35] 36 38 38 38 40 40 40 40 40 44 46 48 48 48 48 50

[[2]]
 [1]  8  8 14 14 16 18 20 20 20 22 22 24 24 24 24 26 26
[18] 26 26 28 28 28 28 30 30 30 32 32 34 34 34 36 36 36
[35] 36 38 38 38 40 40 40 40 40 44 46 48 48 48 48 50

car_list[, .(.N, max(dist)), speed]
# lapply(car_list, '[', j = list(.N, max(dist)), by = speed)
# or 
# lapply(car_list, '[', ,.(.N, max(dist)), speed)

[[1]]
    speed N  V2
 1:     4 2  10
 2:     7 2  22
 3:     8 1  16
 4:     9 1  10
 5:    10 3  34
...

[[2]]
    speed N  V2
 1:     4 2  10
 2:     7 2  22
 3:     8 1  16
 4:     9 1  10
 5:    10 3  34
...

This works with the := operator:

car_list[, `:=` (more_speed = speed+5)]
# or
# lapply(car_list, '[', , `:=` (more_speed = speed+5))

car_list
[[1]]
    speed dist more_speed
 1:     4    2          9
 2:     4   10          9
 3:     7    4         12
 4:     7   22         12
 5:     8   16         13
...

[[2]]
    speed dist more_speed
 1:     4    2          9
 2:     4   10          9
 3:     7    4         12
 4:     7   22         12
 5:     8   16         13
Cole
  • 10,670
  • 1
  • 8
  • 22
4

First base R option is substitute(...()) followed by do.call:

library(data.table)
cars1 = setDT(copy(cars))
cars2 = setDT(copy(cars))
cars2[, speed := sort(speed, decreasing = TRUE)]

car_list = list(cars1, cars2)
class(car_list) <- "dd"

`[.dd` <- function(x,...) {
  a <- substitute(...()) #this is an alist
  expr <- quote(x[[i]])
  expr <- c(expr, a)
  res <- list()
  for (i in seq_along(x)) {
    res[[i]] <- do.call(data.table:::`[.data.table`, expr)
  }
  res
}

all.equal(
  car_list[,.N, by = speed],
  list(cars1[,.N, by = speed], cars2[,.N, by = speed])
)
#[1] TRUE

all.equal(
  car_list[, speed*2],
  list(cars1[, speed*2], cars2[, speed*2])
)
#[1] TRUE

Second base R option is match.call, modify the call and then evaluate (you find this approach in lm):

`[.dd` <- function(x,...) {
  thecall <- match.call()
  thecall[[1]] <- quote(`[`)
  thecall[[2]] <- quote(x[[i]])
  res <- list()
  for (i in seq_along(x)) {
    res[[i]] <- eval(thecall)
  }
  res
}

all.equal(
  car_list[,.N, by = speed],
  list(cars1[,.N, by = speed], cars2[,.N, by = speed])
)
#[1] TRUE

all.equal(
  car_list[, speed*2],
  list(cars1[, speed*2], cars2[, speed*2])
)
#[1] TRUE

I haven't tested if these approaches will make a deep copy if you use :=.

Roland
  • 122,144
  • 10
  • 182
  • 276
3

The suggestion in my comment wasn't complete. You can indeed use rlang to support tidy evaluation, but since data.table itself doesn't support it directly, you're better off using expressions instead of quosures, and you need to build the complete final expression before calling eval_tidy:

`[.dd` <- function(x, ...) {
  code <- rlang::enexprs(...)
  lapply(x, function(dt) {
    ex <- rlang::expr(dt[!!!code])
    rlang::eval_tidy(ex)
  })
}

car_list[, .N, by = speed]
[[1]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
 6:    11 2
 7:    12 4
 8:    13 4
 9:    14 4
10:    15 3
11:    16 2
12:    17 3
13:    18 4
14:    19 3
15:    20 5
16:    22 1
17:    23 1
18:    24 4
19:    25 1

[[2]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
 6:    11 2
 7:    12 4
 8:    13 4
 9:    14 4
10:    15 3
11:    16 2
12:    17 3
13:    18 4
14:    19 3
15:    20 5
16:    22 1
17:    23 1
18:    24 4
19:    25 1
Alexis
  • 4,647
  • 1
  • 17
  • 34
  • It's a great solution, but it couldn't handle `car_list[, abc := speed*3]`. – xiaodai Jul 23 '19 at 07:21
  • 1
    @xiaodai If you want to use `:=`, you'd need to pass `.unquote_names = FALSE` to `enexprs`. `rlang` also uses `:=` for its own purposes, so interaction between it and `data.table` requires special considerations. – Alexis Jul 23 '19 at 07:44