2

My data.table dt0 is :

ANSWER1 ANSWER2 ANSWER3 ANSWER4 ANSWER5 ANSCOUNT   SCORE100
 21      37      16      14      11     200784       61
 20      37      16      15      11     177165       60

ANSWERx are percentages. SCORE100 is computed as SCORE100 := (100*ANSWER1*ANSCOUNT + 75*ANSWER2*ANSCOUNT + 50*ANSWER3*ANSCOUNT + 25*ANSWER4*ANSCOUNT) / (ANSWER1*ANSCOUNT + ANSWER2*ANSCOUNT + ANSWER3*ANSCOUNT +ANSWER4*ANSCOUNT + ANSWER5*ANSCOUNT)]

In order to recompute SCORE100 for all lines in my data-set, I need to automatically (in a loop) multiply all ANSWERx by ANSCOUNT in each row, then add all obtained multiplications and then divide by sum of all ANSCOUNT.

Can you help please,something along the lines below:

    for (c in paste0("ANSWER",1:5)) {
      dt0[, as.name(paste0(c,"_t")):= as.name(c)*"ANSCOUNT", , with=T]
    }

Thank you.

Here's the error. How to understand it and how to do it right?

 Error in `[.data.table`(dt0, , `:=`(as.name(paste0(c, "_t")), as.name(c) *  : 
 LHS of := must be a symbol, or an atomic vector (column names or positions).
IVIM
  • 1,703
  • 1
  • 12
  • 32
  • A sub-question: What code will create automatically N new columns `ANSWER1_t`,...., `ANSWERN_t`, which are equal to `ANSWER1*ANSCOUNT`, ... `ANSWERN*ANSCOUNT` ? – IVIM May 14 '19 at 14:22
  • I still want to know better how/when to use `as.name(x)` or `get(x)` with `data.table`, where x is passed as parameter (of a loop or function). Can you give me examples of good use of these, please? It would be great if 'data.table' help/manual would include these examples too... – IVIM May 14 '19 at 14:30
  • `as.name` only fits if you are building an expression and applying `eval`, I think. Here you could do `for (c in paste0("ANSWER",1:5)) dt0[, paste0(c,"_t") := get(..c)*ANSCOUNT]` The dot-dot notation is fairly new, used to clarify that c refers to a variable "one level up" from the table – Frank May 14 '19 at 14:37

3 Answers3

7

To answer the overall question - "When and how to use as.name() vs.get() in data.table?".
In my experience, it is rarely needed when you can use existing R programming methods:

A functional approach similar to the other (now removed) answer:

dt0[, 
  newcol := Reduce(`+`, Map(`*`, c(100,75,50,25,0), .SD)) / Reduce(`+`, .SD),
  .SDcols = ANSWER1:ANSWER5
]

#   ANSWER1 ANSWER2 ANSWER3 ANSWER4 ANSWER5 ANSCOUNT SCORE100   newcol
#1:      21      37      16      14      11   200784       61 60.85859
#2:      20      37      16      15      11   177165       60 60.10101

Explanation:

Map the weighting factor from 100-to-0 to the corresponding ANSWER1:ANSWER5 columns, then Reduce them together using a + function. In other words, 100*ANSWER1 + 75*ANSWER2 + 50*ANSWER3 ... etc.

Then use the same Reduce and + to form the denominator of the equation.

I think the multiplication by ANSCOUNT is pointless as it is in both the numerator and denominator.

thelatemail
  • 85,757
  • 12
  • 122
  • 177
  • 1
    A more convoluted way to use Reduce: `DT[, Reduce(\`+\`, Reduce(\`+\`, .SD[,-length(.SD), with=FALSE], accumulate=TRUE))/Reduce(\`+\`, .SD)*25, .SDcols=ANSWER1:ANSWER5]` – Frank May 14 '19 at 14:28
4

This might not be able to compete speed wise with data.table but a base R option using sweep and rowSums

rowSums(sweep(dt0[,1:5],2, seq(100,0,-25), FUN = "*"))/rowSums(dt0[, 1:5])
#[1] 60.859 60.101

Or without using sweep

rowSums(t(t(dt0[,1:5]) * seq(100,0,-25)))/rowSums(dt0[, 1:5])

Both the approaches above implicitly converts the data.table to matrix and performs the calculation. We can have a slight improvement in the speed by using matrix multiplication and by avoiding the subset operation twice. (thanks to @nicola).

m <- as.matrix(dt0[,.SD,.SDcols=ANSWER1:ANSWER5])
m %*% seq(100,0,-25) /rowSums(m)
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178
1

I just found at Assigning/Referencing a column name in data.table dynamically (in i, j and by) the inter-relationship between the two:

 eval(as.name(strFactor)) 

is the same

get(strFactor)

with only difference that former is "safer" than the latter (specifically, get doesnot work in some older versions ns some packages (e.g. in data.table older versions)

So, with this caveat in mind, either of these two can be used - when looping through columns!

IVIM
  • 1,703
  • 1
  • 12
  • 32