0

Say that I have the following vector:

dat <- c(1,0,-1,1,0,-1,1,0,1)

I want a vector that counts the occurences of 1, 0 an -1 in dat but as an ongoing tally. The solution would look like so:

tally <- c(1,1,1,2,2,2,3,3,4)

So essentially my new vector has an ongoing tally of 1, 0 and -1 from dat. I am looking for a way to do this calculation in R so I can use it on a much larger set.

Chris95
  • 75
  • 1
  • 10
  • This isn't well defined. Are the values in `dat` always to be taken in chunks of three? How do you know which values in `tally` are counting which values? Or do you mean that `tally` should be read positionally? – joran Dec 20 '17 at 20:48
  • 2
    The very first answer here works fine with vectors: [Numbering rows within groups in a data frame](https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame) – Henrik Dec 20 '17 at 20:51
  • @joran Apologies, the only possible values in dat will be in {1, 0 ,-1} but they will not be bunched in threes (i.e the distribution of the {1, 0, -1} in dat is random). And yes, I mean that tally should be read positionally. – Chris95 Dec 20 '17 at 21:00

2 Answers2

1

Here is a fairly simple approach:

> dat <- c(1,0,-1,1,0,-1,1,0,1)
> tally <- ave(dat, factor(dat), FUN=seq_along)
> tally
[1] 1 1 1 2 2 2 3 3 4

The ave function splits the dat vector apart by the unique values in dat (-1, 0, and 1 in this case), then seq_along is a quick and dirty way to get the running tally for each unique value, then ave puts the separate cumulative counts back together in the order to match the original data.

Greg Snow
  • 47,077
  • 6
  • 76
  • 101
1
dat <- c(1,0,-1,1,0,-1,1,0,1)

new_vec <- NULL
count_this <- function(vec) {
    for(i in 1:length(vec)) {
    this_elem = vec[i]
    before_vec <- vec[1:i]
    contains_vec <- before_vec[before_vec == this_elem]
    new_vec[i] <- length(contains_vec)
    }
    return(new_vec)
}

Use like this:

count_this(dat)

1 1 1 2 2 2 3 3 4

But definitely use Greg's much more efficient approach:

dat_long <- round(rnorm(10000), 0)

start.time <- Sys.time()
res_a <- count_this(dat_long)
end.time <- Sys.time()
time.taken <- end.time - start.time
p_1 <- as.vector(time.taken)

start.time <- Sys.time()
res_b <- ave(dat_long, factor(dat_long), FUN=seq_along)
end.time <- Sys.time()
time.taken <- end.time - start.time
p_2 <- as.vector(time.taken)

final <- data.frame(For_Loop = p_1, Vectorized = p_2)
mp <- barplot(as.matrix(final), col='steelblue', beside=T, main='Runtimes for Tally Algoritm')

enter image description here

Cybernetic
  • 11,188
  • 15
  • 76
  • 114