11

I have a dataframe with only 1 row. To this I start to add rows by using rbind

df #mydataframe with only one row
for (i in 1:20000)
{
    df<- rbind(df, newrow)

}

this gets very slow as i grows. Why is that? and how can I make this type of code faster?

smci
  • 29,564
  • 18
  • 109
  • 144
Mark
  • 10,384
  • 19
  • 58
  • 80

2 Answers2

23

You are in the 2nd circle of hell, namely failing to pre-allocate data structures.

Growing objects in this fashion is a Very Very Bad Thing in R. Either pre-allocate and insert:

df <- data.frame(x = rep(NA,20000),y = rep(NA,20000))

or restructure your code to avoid this sort of incremental addition of rows. As discussed at the link I cite, the reason for the slowness is that each time you add a row, R needs to find a new contiguous block of memory to fit the data frame in. Lots 'o copying.

joran
  • 163,977
  • 32
  • 423
  • 453
  • great. Thanks for the tip. – Mark Feb 04 '13 at 19:33
  • 1
    so I reallocated the dataframe and started inserting 1row dataframes into it (df[j] – Mark Feb 04 '13 at 20:05
  • 1
    @Mark Yeah, like I said, this sort of thing is rather un-R-like. Modifying objects will still require a certain amount of copying of the entire object. Do you really want to copy the entire data frame each time you add a row? Probably not. Generate a list of each row using `lapply` and then stitch them together using `do.call(rbind,...)`. But beyond that, the solution requires more refactoring than I can help with given the information you've provided. – joran Feb 04 '13 at 20:08
  • Kudos. Thanks a lot. I have thought a lot about using apply here but the problem has such a weird shape my brain is incapable of comprehending a proper functional form for it :) thanks for the help – Mark Feb 04 '13 at 20:10
  • 1
    I'm a little surprised that the `df[j,] – Ben Bolker Feb 04 '13 at 22:05
  • @BenBolker Me too, but it's basically impossible to know what might be going on without more complete code. – joran Feb 04 '13 at 22:13
  • @BenBolker I changed my code to use df[j,] – user1892410 Jun 07 '16 at 21:26
1

I tried an example. For what it's worth, it agrees with the user's assertion that inserting rows into the data frame is also really slow. I don't quite understand what's going on, as I would have expected the allocation problem to trump the speed of copying. Can anyone either replicate this, or explain why the results below (rbind < appending < insertion) would be true in general, or explain why this is not a representative example (e.g. data frame too small)?

edit: the first time around I forgot to initialize the object in hell2fun to a data frame, so the code was doing matrix operations rather than data frame operations, which are much faster. If I get a chance I'll extend the comparison to data frame vs. matrix. The qualitative assertions in the first paragraph hold, though.

N <- 1000
set.seed(101)
r <- matrix(runif(2*N),ncol=2)

## second circle of hell
hell2fun <- function() {
    df <- as.data.frame(rbind(r[1,])) ## initialize
    for (i in 2:N) {
        df <- rbind(df,r[i,])
    }
}

insertfun <- function() {
    df <- data.frame(x=rep(NA,N),y=rep(NA,N))
    for (i in 1:N) {
        df[i,] <- r[i,]
    }
}

rsplit <- as.list(as.data.frame(t(r)))
rbindfun <-  function() {
    do.call(rbind,rsplit)
}

library(rbenchmark)
benchmark(hell2fun(),insertfun(),rbindfun())

##          test replications elapsed relative user.self 
## 1  hell2fun()          100  32.439  484.164    31.778 
## 2 insertfun()          100  45.486  678.896    42.978 
## 3  rbindfun()          100   0.067    1.000     0.076 
Ben Bolker
  • 192,494
  • 24
  • 350
  • 426
  • I'd suggest using `rsplit – mnel Feb 04 '13 at 22:38
  • will do when I get a chance (I don't understand the first sentence yet ... if I understand correctly, I don't think the matrix-splitting should be charged to the function -- I assumed that the rows would become available one by one within some sort of iterative procedure) – Ben Bolker Feb 04 '13 at 22:45
  • assign the result `(df – mnel Feb 04 '13 at 22:47