1

I'm trying to put together several files and need to do a bunch of merges on column names that are created inside a loop. I can do this fine using data.frame() but am having issues using similar code with a data.table():

library(data.table)

df1 <- data.frame(id = 1:20, col1 =  runif(20))
df2 <- data.frame(id = 1:20, col1 =  runif(20))

newColNum <- 5
newColName <- paste('col',newColNum ,sep='')

df1[,newColName] <- runif(20)

df2 <- merge(df2, df1[,c('id',newColName)], by = 'id', all.x = T) # Works fine
######################

dt1 <- data.table(id = 1:20, col1 =  runif(20))
dt2 <- data.table(id = 1:20, col1 =  runif(20))

newColNum <- 5
newColName <- paste('col',newColNum ,sep='')

dt1[,newColName] <- runif(20)

dt2 <- merge(dt2, dt1[,c('id',newColName)], by = 'id', all.x = T) # Doesn't work

Any suggestions?

screechOwl
  • 25,740
  • 58
  • 153
  • 258
  • a few suggestions - in `data.table`, assign by reference using `:=` instead of what you did, avoid using character strings and use column names directly, and use the `data.table` join instead of `merge` – eddi Nov 05 '13 at 19:54

2 Answers2

1

This really has nothing to do with merge(), and everything to do with how the j (i.e. column) index is, by default, interpreted by [.data.table().

You can make the whole statement work by setting with=FALSE, which causes the j index to be interpreted as it would be in a data.frame:

dt2 <- merge(dt2, dt1[,c('id',newColName), with=FALSE], by = 'id', all.x = T)
head(dt2, 3)
#    id      col1       col5
# 1:  1 0.4954940 0.07779748
# 2:  2 0.1498613 0.12707070
# 3:  3 0.8969374 0.66894157

More precisely, from ?data.table:

with: By default 'with=TRUE' and 'j' is evaluated within the frame of 'x'. The column names can be used as variables. When 'with=FALSE', 'j' is a vector of names or positions to select.

Note that this could be avoided by storing the columns in a variable like so:

cols = c('id', newColName)
dt1[ , ..cols]

.. signals to "look up one level"

MichaelChirico
  • 32,615
  • 13
  • 106
  • 186
Josh O'Brien
  • 154,425
  • 26
  • 353
  • 447
0

Try dt1[,list(id,get(newColName))] in your merge.

eddi
  • 48,392
  • 6
  • 100
  • 153
TheComeOnMan
  • 11,925
  • 7
  • 36
  • 53
  • fyi, for many columns this is much worse than the `with=FALSE` solution, as the mere mention of "get", whips `data.table` into a frenzy and it constructs *all* the columns internally – eddi Nov 06 '13 at 06:24