2

Maybe some can tell me why the names I assigned to "idVars" are changing after adding a column to my data.table (without reassigning them)? How can I persist the assignment to store only the first two column names?

Thanks!

library(data.table)

DT <- data.table(a=1:10, b=1:10)
idVars <- names(DT)
print(idVars)
# [1] "a" "b"

DT[, "c" := 1:10]
print(idVars)
# [1] "a" "b" "c"


# devtools::session_info()                
# data.table * 1.11.6  2018-09-19 CRAN (R 3.5.1)
ismirsehregal
  • 21,473
  • 3
  • 20
  • 58

1 Answers1

6

We can create a copy of the names as the names(DT) and the 'idVars' have the same memory location

tracemem(names(DT))
#[1] "<0x7f9d74c99600>"
tracemem(idVars)
#[1] "<0x7f9d74c99600>"

So, instead create a copy of the names

idVars <- copy(names(DT))
tracemem(idVars)
#[1] "<0x7f9d7d2b97c8>"

and it wouldn't change after the assignment

DT[, "c" := 1:10]
idVars
#[1] "a" "b"

According to ?copy:

A copy() may be required when doing dt_names = names(DT). Due to R's copy-on-modify, dt_names still points to the same location in memory as names(DT). Therefore modifying DT by reference now, say by adding a new column, dt_names will also get updated. To avoid this, one has to explicitly copy: dt_names <- copy(names(DT)).

Henrik
  • 61,039
  • 13
  • 131
  • 152
akrun
  • 789,025
  • 32
  • 460
  • 575
  • Great thanks! Was this behaviour changed lately? I'm wondering why I haven't stumbled over this earlier. – ismirsehregal Oct 18 '18 at 16:56
  • 1
    Big upvote for the explanation. Didn't know that R uses a pointer in that case. – Roman Oct 18 '18 at 16:57
  • @ismirsehregal Not sure about the changes in this case, but usually when we do the `:=`, I create a copy of the initial object if I want to keep it separate – akrun Oct 18 '18 at 16:59