1

Strange behaviour from data.table package, try below code, why does the ordering changes in x?

#R version 3.1.0 (2014-04-10)
#data.table_1.9.2 same error for (data.table_1.9.4)

require(data.table)

#dummy data
dat <- fread("A,B
6,7
4,5
1,2
3,4
0,2")

#get x and y
x <- dat$A
y <- dat[,A]

#compare - x and y, same.
x # [1] 6 4 1 3 0
y # [1] 6 4 1 3 0
all(x==y) # [1] TRUE

#Set key on column A
setkey(dat,A)

#compare - x is not same as y anymore!
x # [1] 0 1 3 4 6
y # [1] 6 4 1 3 0
all(x==y) # [1] FALSE
zx8754
  • 46,390
  • 10
  • 104
  • 180

1 Answers1

1

To expand my comments:

After doing:

require(data.table)
dat <- fread("A,B
6,7
4,5
1,2
3,4
0,2")

# get x and y
x <- dat$A
y <- dat[,A]

If you do:

.Internal(inspect(x))
# @7fa677439e40 13 INTSXP g0c3 [NAM(2)] (len=5, tl=5) 6,4,1,3,0
.Internal(inspect(dat$A))
# @7fa677439e40 13 INTSXP g0c3 [NAM(2)] (len=5, tl=5) 6,4,1,3,0

The address @7fa677439e40 as you can see is identical (the value itself will be different on your device). This is because R doesn't really copy the data when we use the $ operator to extract an entire column and assign it to a variable. It copies only when it's absolutely essential.

Doing the same for the second case:

.Internal(inspect(y))
# @7fa677455248 13 INTSXP g0c3 [NAM(2)] (len=5, tl=5) 6,4,1,3,0
.Internal(inspect(dat)) # pasting the first 3 lines of output here
# @7fa674a0be00 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=2, tl=100)
#   @7fa677439e40 13 INTSXP g0c3 [NAM(2)] (len=5, tl=5) 6,4,1,3,0 <~~~~~~~ 
#   @7fa677439e88 13 INTSXP g0c3 [NAM(2)] (len=5, tl=5) 7,5,2,4,2

The address of y and dat[, A] (see arrow mark) are not identical. This is because the data.table subset created a copy already. In R, both dat$A and dat[["A"]] will not make a copy under these circumstances (also good to know when you don't want to make unnecessary copies!).

Please write back if you have more questions.

HTH

More info on copy-on-modify.

Community
  • 1
  • 1
Arun
  • 113,200
  • 24
  • 277
  • 373