31

raw is a data.table and the following code works:

raw[,r_responseTime] #Returns the whole column
raw[,c_filesetSize]  #Same as above, returns column
plot(raw[,r_responseTime]~raw[,c_filesetSize]) #draws something

Now I want to specify these columns from a string, so for example:

col1="r_reponseTime"
col2="c_filesetSize"

How can I now achieve the same as above while referencing the columns by the string?

raw[,col1] #Returns the whole column
raw[,col2]  #Same as above, returns column
plot(raw[,col1]~raw[,col2]) #draws something

Does not work, of course because I need some kind of "dereferencation". I didn't know what to search in the help and the internet, so sorry for the dumb question.

Matt Dowle
  • 57,542
  • 22
  • 163
  • 221
theomega
  • 30,965
  • 21
  • 85
  • 125
  • 3
    In addition to the answers, try `with=FALSE`. Also, see FAQs 1.5, 1.6 and 1.7. – Matt Dowle Mar 26 '12 at 11:06
  • `with=FALSE` does not seem to work with the `by` argument, any solution for that? – tlamadon Feb 26 '13 at 01:01
  • Well, actually, a vector of strings works out of the box in the `by` argument. – tlamadon Feb 26 '13 at 01:11
  • 2
    Man, this is a *really* annoying part of data.table... If you write it one way, it works with dataframes, and if you fix it for data.table, it fails for dataframes. Is there no general solution? – naught101 Sep 10 '14 at 06:17
  • @naught101 I use standard base R `raw[[col1]]` for selecting a single column as a vector from a data.table where `col1` contains which one. I don't see why people are trying to use data.table `[...]` for that. The NEWS items explicitly recommend `[[` and `$` on data.table where whole columns are required as vectors. Maybe this advice needs to be added to `?data.table`. – Matt Dowle Feb 07 '17 at 23:56
  • @naught101 More annoying was that `DT[,1]`, `DT[,3:10]` and `DT[,colP:colW]` didn't work before. They all work now in recent versions to alleviate that annoyance. Without losing the convenience and power that `j` can be expressions of column names directly. – Matt Dowle Feb 08 '17 at 00:04
  • @Frank how is this a duplicate of a question asked later? – theomega Mar 07 '19 at 14:47
  • The dupe target has an answer by the package's author that is better maintained (eg, includes the `..x` notation). Looking around on meta, it seems like this sort of closure is regarded as okay https://meta.stackexchange.com/a/147651 – Frank Mar 07 '19 at 15:08

4 Answers4

34

It would be nice if you had provided a reproducible example, or at the very least shown what the column names of raw are and what r_responseTime and c_filesetSize contain. This being said, get is your function for dereferencing so give these a try:

raw[, get(col1)]
raw[, get(col2)]
plot(raw[, get(col1)] ~ raw[, get(col2)])
flodel
  • 85,263
  • 19
  • 176
  • 215
10

A modern approach is to use ..:

raw[ , ..col1]

.. "looks up a level" to find col1.


An older, less preferred alternative is to use the match() function or %in% operator.

raw[, match(col1, names(raw)),with=FALSE]
MichaelChirico
  • 32,615
  • 13
  • 106
  • 186
Etienne Low-Décarie
  • 12,513
  • 15
  • 63
  • 87
6

If you have a vector of strings, you can use mget

cols = c("r_reponseTime", "c_filesetSize")
raw[, mget(cols)]
tadejsv
  • 1,998
  • 1
  • 15
  • 19
0

Unfortunately "get" can be problematic! See example below:

m = 100
x1 = sample(c("cat", "dog", "horse"), m, replace=TRUE)
y1 = rnorm(m)
fill1 = sample(c("me", "myself", "dude"), m, replace=TRUE)
df = data.frame("x"=x1, "y"=y1, "fill"=fill1)
dt = data.table(df)

get does not work!

y = "y"
dt[ , get(y)]

get works!

yCol = "y"
dt[ , get(yCol)]

works always, but it's not pretty!

eval(parse(text = paste0("values = dt[ ,",  y, "]")))
eval(parse(text = paste0("values = dt[ ,",  yCol, "]")))
Suraj Rao
  • 28,850
  • 10
  • 94
  • 99
rz1317
  • 89
  • 1
  • 2