Numerically stable and fast sum of last K elements in sequence

Question

Suppose I have a long, possibly infinite, sequence $x := [x_1, x_2, ...]$, and I want to use it to compute another sequence $y:=[y_1, y_2, ...]$ where each element is the sum of the last K elements of the input sequence. i.e.

$y_i = \sum_{j=max(1, i-K)}^i x_j$

The naive, inefficient way to do (in Python) this would be:

def sum_of_last_k(x: Sequence[float], k: int):
    buffer = [0. ] * k  # Initialize a buffer of k zeros
    for i, xi in enumerate(x):
        buffer[i % k] = xi  # Where % is modulo
        yield sum(buffer)

... Which would have $\mathcal O(K)$ efficiency per iteration. However, I want to do it online and efficiently. We could do this in $\mathcal O(1)$ by keeping a running sum:

def sum_of_last_k(x: Sequence[float], k: int):
   buffer = [0. ] * (k-1)  # Initialize a buffer of k-1 zeros
   running_sum = 0
   for i, xi in enumerate(x):
       ix_wrapped = i % (k-1)
       old_value = buffer[ix_wrapped]
       buffer[ix_wrapped] = xi
       running_sum = running_sum + xi - old_value
       yield running_sum

This has $\mathcal O(1)$ efficiency, but there is still a problem: Due to floating-point precision errors, there can be a rounding error in running_sum which accumulates over time.

Now I know I could just resolve this by recomputing running_sum from scratch every M iterations, where M is some large number, and have the average runtime be $\mathcal O((M-1+K)/M)$ which may be very close to 1... But I want the worst case runtime to be $\mathcal O(1)$, not $\mathcal O(K)$.

So, is there some way to compute this in $\mathcal O(1)$ time per iteration while still being numerically stable?

Perhaps Kahan summation (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) would be of use here to reduce floating point summation errors. — coolguy1000000, Mar 25 '20 at 11:51
A small variant on Kahan summation looks like right thing to do here, thanks! I'll try it and post it as an answer if it works (or feel free to do so yourself). — Peter, Mar 25 '20 at 16:20
@Peter Did Kahan summation work in the end? I never studied the details, but I got the (possibly wrong) impression that all it does is doubling the number of digits of precision by "emulating" quad-precision. So my guess would have been that with Kahan summation you run into the same kind of instability as soon as your numbers span a range of $\mathsf{u}^2$ or more. — Federico Poloni, Apr 03 '20 at 09:29
Thanks for following up. Haven't had a chance to try it yet but by impression was, based on the Wikipedia quote, "With compensated summation, the worst-case error bound is effectively independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision." and the fact that since the sum will only contain max K elements, I'd be good... But I'll report back when I have the answer. — Peter, Apr 03 '20 at 16:46

Federico Poloni · Answer 1 · 2020-03-25T20:40:29.020

Very interesting question!

LAPACK-inspired adaptive strategy

This reminds me of a bug that was found in a LAPACK routine (rank-revealing QR) related to 'downdating' norms: essentially, you are given the norm of a vector v, and you want to compute at each iteration in O(1) the norm of the same vector after chopping off its initial entry: v[1:], v[2:], ... (in Python notation). There are orthogonal transformations thrown in between these sums that make it difficult to use different strategies. See http://www.netlib.org/lapack/lawnspdf/lawn176.pdf for more detail.

From what I understand, the solution the researchers adopted was tracking the accumulation of floating-point error explicitly and recompute the sum when it is too large.

Simple versions of this udea are not too complicated to implement: for instance, keep track of the largest number that appeared in your computation, and recompute the sum from scratch if the current result is significantly smaller than it. This strategy is still worst-case O(K), but at least you pay O(K) only when you really need it for stability reasons and O(1) otherwise.

O(\sqrt{K}) time strategy

Anyway, to go deterministically below O(K), a possible idea is the following:

assume $K$ is a perfect square, for simplicity.
divide the elements cyclically into $\sqrt{K}$ buckets: bucket $j$ contains all elements x[i] with i % sqrt(K) == j. For each bucket, memorize its contribution to the running sum in a temporary variable.
at each step, when you shift the window, all the buckets are unchanged except for one of them, in which an element is replaced: recompute the sum of this bucket from scratch; then compute the sum-of-sums. Both operations require $O(\sqrt{K})$ time.

Note that you still need O(K) space with this strategy if you have only online access to the sequence, but at each iteration you access only $O(\sqrt{K})$ of it.

Possibly a recursive version of this strategy with suitably-chosen bucket sizes could reach $O(\log K)$ for sufficiently large $K$.

Thanks for your answer Federico. Nice idea and I see how this could lead to an $\mathcal O(\log K)$ solution. I'll first look into Kahan summation, which a commenter suggested, which might provide a \mathcal O(1) solution. — Peter, Mar 25 '20 at 15:19

Numerically stable and fast sum of last K elements in sequence

1 Answers1

LAPACK-inspired adaptive strategy

O(\sqrt{K}) time strategy