19

I'm trying to implement the following function in double-precision floating point with low relative error:

$$\mathrm{logsum}(x,y) = \log(\exp(x) + \exp(y))$$

This is used extensively in statistical applications to add probabilities or probability densities that are represented in log space. Of course, either $\exp(x)$ or $\exp(y)$ could easily overflow or underflow, which would be bad because log space is used to avoid underflow in the first place. This is the typical solution:

$$\mathrm{logsum}(x,y) = x + \mathrm{log1p}(\exp(y - x))$$

Cancellation from $y-x$ does happen, but is mitigated by $\exp$. Worse by far is when $x$ and $\mathrm{log1p}(\exp(y - x))$ are close. Here's a relative error plot:

enter image description here

The plot is cut off at $10^{-14}$ to emphasize the shape of the curve $\mathrm{logsum}(x,y) = 0$, about which the cancellation occurs. I've seen error up to $10^{-11}$ and suspect that it gets much worse. (FWIW, the "ground truth" function is implemented using MPFR's arbitrary-precision floats with 128-bit precision.)

I've tried other reformulations, all with the same result. With $\log$ as the outer expression, the same error occurs by taking a log of something near 1. With $\mathrm{log1p}$ as the outer expression, cancellation happens in the inner expression.

Now, the absolute error is very small, so $\exp(\mathrm{logsum}(x,y))$ has very small relative error (within an epsilon). One might argue that, because a user of $\mathrm{logsum}$ is really interested in probabilities (not log probabilities), this terrible relative error isn't a problem. It's likely that it usually isn't, but I'm writing a library function, and I'd like its clients to be able to count on relative error not much worse than rounding error.

It seems I need a new approach. What might it be?

Neil Toronto
  • 293
  • 1
  • 6
  • I don't understand your last paragraph. "within an epsilon" doesn't mean anything to me. Do you mean a Unit in the Last Place? As for users being interested in probabilities, a small log probability error will result in a large probability error, so this is not the case. – Aron Ahmadia Oct 22 '12 at 18:37
  • Out of curiosity, have you tried taking the "best of" of your two methods and plotting the error of that? Then all you need is the right logic to detect which case you're in (hopefully being less costly or part of the required cost of the algorithm anyway), then switching to the appropriate method. – Aron Ahmadia Oct 22 '12 at 18:43
  • @AronAhmadia: "Within an epsilon" means relative error less than a double-precision floating-point epsilon, which is about 2.22e-16. For normal (i.e. not subnormal) floats, it corresponds to about an ulp. Also, if $a$ is the absolute error of $x$, then the relative error of $\exp(x)$ is $\exp(a)-1$, which is nearly the identity function near zero. IOW, small absolute error for $x$ implies small relative error for $\exp(x)$. – Neil Toronto Oct 22 '12 at 19:47
  • Addendum: When absolute error $a$ is near zero. When $a > 1$, for example, you're right: the relative explodes. – Neil Toronto Oct 22 '12 at 20:02

1 Answers1

13

The formula $$\mathrm{logsum}(x,y)=\max(x,y)+\mathrm{log1p}(\exp(-\operatorname{abs}(x-y))$$ should be numerically stable. It generalizes to a numerically stable computation of $$\log \sum_i e^{x_i} = \xi+ \log\sum_i e^{x_i-\xi},~~~\xi=\max_i x_i$$

In case the logsum is very close to zero and you want high relative accuracy, you can probably use $$\mathrm{logsum}(x,y)=\max(x,y)+\mathrm{lexp}(x-y)$$ using an accurate (i.e., more than double precision) implementation of $$\mathrm{lexp}(z):=\log(1+e^{-|z|})$$ which is nearly linear for small $z$.

nicoguaro
  • 8,500
  • 6
  • 23
  • 49
Arnold Neumaier
  • 11,318
  • 20
  • 47
  • In terms of absolute error, it is. In terms of relative error, it's awful when the output is near zero. – Neil Toronto Oct 22 '12 at 17:20
  • @NeilToronto: Please give an example with two explicit inputs $x$ and $y$, so that I can play with it. – Arnold Neumaier Oct 22 '12 at 18:46
  • For x = -0.775 and y = -0.6175, I get 62271 ulps error and 1.007e-11 relative error. – Neil Toronto Oct 22 '12 at 19:50
  • @NeilToronto: Thanks. I edited my answer to give a new suggestion. In which language will your library implementation be? – Arnold Neumaier Oct 22 '12 at 20:16
  • Thanks for looking into it, but that still doesn't work. In fact, if I do everything but the outer addition using 1024-bit floats, the error doesn't change. The problem here is rounding two opposite-sign values to 53 bits before adding them. – Neil Toronto Oct 22 '12 at 20:41
  • The library is implemented in Racket (www.racket-lang.org). – Neil Toronto Oct 22 '12 at 20:43
  • @NeilToronto: In this case you simply need to evaluate lexp to higher precision (e.g., return two floats $l$, $m$ with $l\gg m$ that add to the result, then forst add $l$, then add $m$.) – Arnold Neumaier Oct 22 '12 at 20:50
  • @ArnoldNeumaier Can you explain further your point in your last comment? I am working with log-sum-exps and I think I am facing round off errors. – Hari Sep 03 '15 at 19:27
  • @haripkannan: if [l,m]=lexp(z) returns two floating point numbers $l,m$ such that $lexp(z)=l+m+$trailing digits, where $l$ contains the leading 53 bits and $m$ the next 53 bits (together with the appropriate powers of two) then the calculation of $x+lexp(z)$ can be done by first calculating $x'=x+l$ (without rounding error because of cancellation and then $x''=x'+m$ with full accuracy since $x'$ is tiny. – Arnold Neumaier Sep 04 '15 at 11:39
  • Thanks for the clarification, @ArnoldNeumaier. How does one generally go about implementing this logic? Any initial pointers would do. – Hari Sep 04 '15 at 16:56
  • 1
    Calculate highly accurate data points in the range of interest - at least ttwo different ranges are needed because of the asymptotic behavior. One can use the defining expression for z not close to zero. For the exceptional range fit a rational function of sufficiently high degree to it to get the desired accuracy. For numerical stability, use bernstein polynomials or Tchebychev polynomials in numerator and denominator, adapted to the interval of interest. At the end, expand into a continued fraction and find out how much one can truncate the coefficients without imapiring the accuracy. – Arnold Neumaier Sep 04 '15 at 18:57
  • 1
    This gives $l=l(z)$ To get $ m $ do the same but applied to the function lexp(z)-l(z). – Arnold Neumaier Sep 04 '15 at 18:58