why is python so much faster at editing column properties than r

Question

I have been working with spotfire and realized that my python codes edit the column properties much faster than the r codes. The r code takes about 24 seconds, while the python code takes about 4 to do the same thing. Is my r code just written poorly that it makes this happen.

Here is an example of my python code:

start=time.time()
count=0
names=[]
for i in olddt.Columns: #getting columns from old data table
    names.append(i)

for i in dt.Columns: #assigning new values
    if count<=4:
        i.Properties["Limits.Whatif.Upper"]=1.0
        i.Properties["Limits.Whatif.Lower"]=1.0
        i.Properties["Limits.Prod.Upper"]=1.0
        i.Properties["Limits.Prod.Lower"]=1.0
        count=count+1
    else:
        i.Properties["Limits.Whatif.Upper"]=float(count-4)+26.0
        i.Properties["Limits.Whatif.Lower"]=float(count-4)-39.0
        i.Properties["Limits.Prod.Upper"]=names[count-4].Properties["Limits.Whatif.Upper"]+5.0
        i.Properties["Limits.Prod.Lower"]=names[count-4].Properties["Limits.Whatif.Lower"]-4.0
        count=count+1

print time.time()-start

Here is my R code:

for(col in 1:ncol(temp2)){
    if (col<=4){
        attributes(temp2[,col])$SpotfireColumnMetaData$upper=Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower=-1*Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2=Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2=-1*Inf
    }
    else{
        names(attributes(dt[,col-4])$SpotfireColumnMetaData)<- lapply( names( attributes(dt[ ,col-4] )$SpotfireColumnMetaData), tolower)
        attributes(temp2[,col])$SpotfireColumnMetaData$upper=2
        attributes(temp2[,col])$SpotfireColumnMetaDatalower=1
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2=attributes(dt[,col-4])$SpotfireColumnMetaData$upper
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2=attributes(dt[,col-4])$SpotfireColumnMetaData$lower
    }
}

I also used an lapply function seen here:

applyLimits <- function(col){
    if (count<4){
        attributes(temp2[,col])$SpotfireColumnMetaData$upper<<-Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower<<- (-1*Inf)
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2<<-Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2<<- (-1*Inf)
        count<<-count+1
    }
    else{
        attributes(temp2[,col])$SpotfireColumnMetaData$upper<<-2
        attributes(temp2[,col])$SpotfireColumnMetaData$lower<<-1
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2<<-attributes(dt[,col-4])$SpotfireColumnMetaData$upper2
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2<<-attributes(dt[,col-4])$SpotfireColumnMetaData$lower2
        count<<-count+1
    }
}

lapply(1:ncol(temp),applyLimits)

If there is some way to improve my r code please tell me, but I haven't seen a better way of adjust the properties of it. According to some research I have done temp2 and dt both should be data.frame

For loops in R are [notoriously inefficient](https://privefl.github.io/blog/why-loops-are-slow-in-r/). If you wish to speed up your R code, take a look at the answer [here](https://stackoverflow.com/questions/14006832/how-to-vectorize-a-for-loop-in-r). — sempervent, Oct 30 '19 at 14:18
I think a lot of performance can be gained using the `data.table`-package. It can alter (among other) attributes by references, using `data.table::setattr()`.. This saves memory ans should speed things up considerably.. — Wimpel, Oct 30 '19 at 14:19
I am not familiar with R but you should be looking at their time complexity differences with the python equivalent. For example try to check how things work under the hood. A for loop in python if its against a list has a complexity of O(n) and the set of the value has a complexity of O(1) which makes it in total of a O(n) complextiy. Seems like the loop you are using in R has a bigger time complexity. — Nergon, Oct 30 '19 at 14:23
Adding some sample data of `temp2` could improve your chances on a good answer. — Wimpel, Oct 30 '19 at 14:26
@Wimpel do you have any examples of using data.table with data tables from spotfire?I will look into it though. — Nikita Belooussov, Oct 30 '19 at 14:34
@JoshuaGrant I have tried using lapply. I will see if I can find the code again, but it made slower, I think it was due to me having to repeatedly go outside of the function to set a value to the property. — Nikita Belooussov, Oct 30 '19 at 14:37
@NikitaBelooussov never heard of (or used) spotfire.. sorry.. — Wimpel, Oct 30 '19 at 14:46
@Wimpel, so I looked into to it a bit. temp2 should be a data.frame. But I cant seem to find a simple explanation or example of how to use setattr(). — Nikita Belooussov, Oct 30 '19 at 15:36
@NikitaBelooussov, can you provide `dput(temp2)`, if it's super large just subset it? — sempervent, Oct 30 '19 at 16:00
@JoshuaGrant I tried using dput, but it seems not to work on spotfire. — Nikita Belooussov, Oct 31 '19 at 07:53

score 0 · Answer 1 · answered Oct 30 '19 at 17:15

Remember R is a vectorised language, your lapply function is not vectorised. To get good performance you need lapply to return a vector and update the whole vector in one go. Your function updates one row and one column at a time which is why you are getting poor performance.

The vectorised approach would be four lapply calls, each updating one whole column. Should look a little like this:

applyLimits1 <- function(col){
  count <<- count+1
  if (count<4) Inf else 2 
}
applyLimits2 <- function(col){
  count <<- count+1
  if (count<4) (-1*Inf) else 1 
}
applyLimits3 <- function(col){
  count <<- count+1
  if (count<4) Inf else attributes(dt[,col-4])$SpotfireColumnMetaData$upper2
}
applyLimits4 <- function(col){
  count <<- count+1
  if (count<4) (-1*Inf) else attributes(dt[,col-4])$SpotfireColumnMetaData$lower2
}

count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$upper <- lapply(1:ncol(temp),applyLimits1)
count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$lower <- lapply(1:ncol(temp),applyLimits2)
count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$upper2 <- lapply(1:ncol(temp),applyLimits3)
count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$lower2 <- lapply(1:ncol(temp),applyLimits4)

I don't have the data to test, I've just pasted your code around. You may be better with sapply or vapply. And of course some languages are better for certain tasks than others...

I dont think this code would work. I need the col variable to incremement as well. So that, ```attributes(temp2[,5])$SpotfireColumnMetaData$upper — Nikita Belooussov, Oct 31 '19 at 08:14

why is python so much faster at editing column properties than r

1 Answers1