Sum every nth points

Question

I have a vector and I need to sum every n numbers and return the results. This is the way I plan on doing it currently. Any better way to do this?

v = 1:100
n = 10
sidx = seq.int(from=1, to=length(v), by=n)
eidx = c((sidx-1)[2:length(sidx)], length(v))
thesum = sapply(1:length(sidx), function(i) sum(v[sidx[i]:eidx[i]]))

This gives:

thesum
 [1]  55 155 255 355 455 555 655 755 855 955

Josh O'Brien · Accepted Answer · 2013-03-07T08:08:33.757

32

unname(tapply(v, (seq_along(v)-1) %/% n, sum))
# [1] 55 155 255 355 455 555 655 755 855 955

edited Mar 07 '13 at 08:08

answered Mar 07 '13 at 07:39

Josh O'Brien

154,425
26
353
447

Ricardo Saporta · Answer 2 · 2013-03-07T07:57:27.317

20

UPDATE:

If you want to sum every n consecutive numbers use colSums
If you want to sum every nth number use rowSums

as per Josh's comment, this will only work if n divides length(v) nicely.

rowSums(matrix(v, nrow=n))
 [1] 460 470 480 490 500 510 520 530 540 550

colSums(matrix(v, nrow=n))
 [1]  55 155 255 355 455 555 655 755 855 955

edited Mar 07 '13 at 07:57

answered Mar 07 '13 at 07:48

Ricardo Saporta

52,793
14
136
168

1

Only works if `length(v)` is evenly divisible by `n`. Otherwise vector recycling will bite you. (See e.g. `v – Josh O'Brien Mar 07 '13 at 07:51
Will work only if `matrix(..., byrow=TRUE)`, hence @Andrie answer where he uses `colSums`and not `rowSums`. – plannapus Mar 07 '13 at 07:55
1

@plannapus, it wasnt clear if OP wanted every n-consecutive or every nth number. – Ricardo Saporta Mar 07 '13 at 07:58
1

If it is every `nth` number, I'd say just 550 is the answer. 10th, 20th etc.. Not 1, 11..., 2, 12 ... etc.. – Arun Mar 07 '13 at 08:06
@Arun, well where do you start counting n numbers? – Ricardo Saporta Mar 07 '13 at 08:20
I understand every nth number of 1:100 as every 10th number: 10, 20, 30, ..., 100. That sums to 550, the earlier answers of both Andrie and Josh. – Arun Mar 07 '13 at 08:28

score 13 · Answer 3 · edited Jun 20 '20 at 09:12

13

Update

The olde version don't work. Here a ne awnser that use rep to create the grouping factor. No need to use cut:

n <- 5 
vv <- sample(1:1000,100)
seqs <- seq_along(vv)
tapply(vv,rep(seqs,each=n)[seqs],FUN=sum)

You can use tapply

tapply(1:100,cut(1:100,10),FUN=sum)

or to get a list

by(1:100,cut(1:100,10),FUN=sum)

EDIT

In case you have 1:92, you can replace your cut by this :

cut(1:92,seq(1,92,10),include.lowest=T)

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 07 '13 at 07:38

agstudy

116,828
17
186
250

I can see why you like the answer, but this would not work for a random vector of numbers where you want to sum up every n elements, would it? – Max M Nov 01 '16 at 18:35
@MaxM You are right. I will udpate my answer to include a new version. – agstudy Nov 01 '16 at 21:15

Andrie · Answer 4 · 2013-03-07T07:48:50.793

7

One way is to convert your vector to a matric then take the column sums:

colSums(matrix(v, nrow=n))
[1]  55 155 255 355 455 555 655 755 855 955

Just be careful: this implicitly assumes that your input vector can in fact be reshaped to a matrix. If it can't, R will recycle elements of your vector to complete the matrix.

edited Mar 07 '13 at 07:48

answered Mar 07 '13 at 07:43

Andrie

170,733
42
434
486

score 4 · Answer 5 · answered Mar 07 '13 at 07:39

4

v <- 1:100

n <- 10

cutpoints <- seq( 1 , length( v ) , by = n )

categories <- findInterval( 1:length( v ) , cutpoints )

tapply( v , categories , sum )

answered Mar 07 '13 at 07:39

Anthony Damico

5,229
6
45
75

(+1) this gives the right result as well, even if v = 1:92 and n = 10. – Arun Mar 07 '13 at 07:51

score 3 · Answer 6 · answered Mar 07 '13 at 08:00

3

I will add one more way of doing it without any function from apply family

v <- 1:100
n <- 10

diff(c(0, cumsum(v)[slice.index(v, 1)%%n == 0]))
##  [1]  55 155 255 355 455 555 655 755 855 955

answered Mar 07 '13 at 08:00

CHP

16,581
4
35
56

2

Just be aware that when, e.g. `v – Josh O'Brien Mar 07 '13 at 08:07
`nv = length(v); i = c(seq_len(nv %/% n) * n, if (nv %% n) nv else NULL)` and then `diff(c(0, cumsum(v)[i]))` seems to get the edge cases of `length(v) == 0` and `length(v) %% n != 0`. – Martin Morgan Mar 07 '13 at 13:42
`slice.index(v, 1)` could be replaced with just `v`, if I'm not mistaken. – Rich Scriven Oct 31 '17 at 23:16

Martin Morgan · Answer 7 · 2013-03-07T19:31:53.817

Here are some of the main variants offered so far

f0 <- function(v, n) {
    sidx = seq.int(from=1, to=length(v), by=n)
    eidx = c((sidx-1)[2:length(sidx)], length(v))
    sapply(1:length(sidx), function(i) sum(v[sidx[i]:eidx[i]]))
}

f1 <- function(v, n, na.rm=TRUE) {    # 'tapply'
    unname(tapply(v, (seq_along(v)-1) %/% n, sum, na.rm=na.rm))
}

f2 <- function(v, n, na.rm=TRUE) {    # 'matrix'
    nv <- length(v)
    if (nv %% n)
        v[ceiling(nv / n) * n] <- NA
    colSums(matrix(v, n), na.rm=na.rm)
}

f3 <- function(v, n) {                # 'cumsum'
    nv = length(v)
    i <- c(seq_len(nv %/% n) * n, if (nv %% n) nv else NULL)
    diff(c(0L, cumsum(v)[i]))
}

Basic test cases might be

v = list(1:4, 1:5, c(NA, 2:4), integer())
n = 2

f0 fails with the final test, but this could probably be fixed

> f0(integer(), n)
Error in sidx[i]:eidx[i] : NA/NaN argument

The cumsum approach f3 is subject to rounding error, and the presence of an NA early in v 'poisons' later results

> f3(c(NA, 2:4), n)
[1] NA NA

In terms of performance, the original solution is not bad

> library(rbenchmark)
> cols <- c("test", "elapsed", "relative")
> v <- 1:100; n <- 10
> benchmark(f0(v, n), f1(v, n), f2(v, n), f3(v, n),
+           columns=cols)
      test elapsed relative
1 f0(v, n)   0.012     3.00
2 f1(v, n)   0.065    16.25
3 f2(v, n)   0.004     1.00
4 f3(v, n)   0.004     1.00

but the matrix solution f2 seems to be both fast and flexible (e.g., adjusting the handling of that trailing chunk of fewer than n elements)

> v <- runif(1e6); n <- 10
> benchmark(f0(v, n), f2(v, n), f3(v, n), columns=cols, replications=10)
      test elapsed relative
1 f0(v, n)   5.804   34.141
2 f2(v, n)   0.170    1.000
3 f3(v, n)   0.251    1.476

score 2 · Answer 8 · answered May 14 '17 at 13:50

One way is to use rollapply from zoo:

rollapply(v, width=n, FUN=sum, by=n)
# [1]  55 155 255 355 455 555 655 755 855 955

And in case length(v) is not a multiple of n:

v <- 1:92

rollapply(v, width=n, FUN=sum, by=n, partial=T, align="left")
# [1]  55 155 255 355 455 555 655 755 855 183

Rich Scriven · Answer 9 · 2017-11-01T01:01:43.720

A little late to the party, but I don't see a rowsum() answer yet. rowsum() is proven more efficient than tapply() and I think it would also be very efficient relative to a few of the other responses as well.

rowsum(v, rep(seq_len(length(v)/n), each=n))[,1]
#  1   2   3   4   5   6   7   8   9  10 
# 55 155 255 355 455 555 655 755 855 955

Using @Josh O'Brien's grouping technique would likely improve efficiency even more.

rowsum(v, (seq_along(v)-1) %/% n)[,1]
#  0   1   2   3   4   5   6   7   8   9 
# 55 155 255 355 455 555 655 755 855 955

Simply wrap in unname() to drop the group names.

Sum every nth points

9 Answers9

UPDATE:

Update

Linked

Related