-1

I am using aggregate function to do the aggregation by

aggregate(x=df$time,by=list(df$id),FUN=sum)

My table is having 100 million records and it takes hours to take the results.How can I reduce the time of this process.Any help is appreciated?

Sotos
  • 47,396
  • 5
  • 31
  • 61
RKR
  • 647
  • 2
  • 12
  • 25

1 Answers1

2

Have you loading your initial table with the data.table library? This will save a significant amount of time just loading 100m rows.

DT <- fread("path/to/file.csv")

Then you can aggregate fairly quickly with:

DT[ , AggColumn := sum(time), by = id]
Oliver Frost
  • 797
  • 4
  • 18