I assume that there are n exponential distribution that $x_i$ ~ $Exp(\lambda_i)$, i=1..n, and I want to calculate the cumulative distribution of $S=x_1+x_2+...+x_n$, the convolution of n exponential distribution. I used the formulation that I read in a paper given by, $$1-\sum_{i=1}^n (\prod \limits_{j=1,j\neq i}^n \frac{\lambda_j}{\lambda_j-\lambda_i})e^{-\lambda_i t}$$ but when I set $\lambda_i = 0.01i+2$, and $n=20$, then when t=0, the result is not equal to 0.
1 Answers
Method 1: Order distribution
You can alternatively model this as an order distribution. There is a link between sums of exponentials and order distributions of exponentials.
Since you are looking for the special case of $\lambda_i = 0.01i+2$ and $n=20$ this link works. The sum is similar to the sampling of $m = 220$ exponential distributions with $\lambda = 0.01$ and taking the 20-th smallest sample.
Then we can model the CDF as a beta distribution transformed by the exponential quantile function.
So if $X \sim Beta(20,220+1-20)$ then $Y = -100 \cdot \ln(1-X)$ is distributed as the sum of $20$ exponential distributed variables with $\lambda_i = 0.01i+2$
Method 2: Hypoexponential distribution
More generally, the sum of exponentials is hypoexponential distributed. The CDF of the hypoexponential distribution can be expressed in terms of a matrix exponential and this can be used to compute it. The matrix exponential is defined as the Taylor series but with $X$ a matrix: $e^{X} = \sum_{k=0}^\infty \frac{1}{k!}X^k$. Several fast algorithms exist to compute it (besides using the first terms in the Taylor series).
Code demonstrtion
The code below demonstrates these methods:
set.seed(1)
library(Rmpfr)
library(matrixcalc) ### allows us to use matrix.power
theoretic computation based on beta distribution
k = 20
xs = seq(0,k*2,0.1) # values x to compute CDF for
pe = pexp(xs,0.01) # cdf of exponential
pb = pbeta(pe,k,200+k+1-k) # cdf of beta based on p from cdf of cdf of exponential
theoretic computations 2 sum of exponential functions
with a trick with logarithms to improve computations
h = function(t, precision = 53) {
using variables with different precision
l = mpfr(seq(2.01,2+0.01*k,0.01), precision)
p = mpfr(rep(1,k),precision)
computing the product and sum
for (j in 1:k) {
for (i in 1:k) {
if (j != i) {
p[j] = p[j] * l[i]/(l[i]-l[j])
}
}
}
return the result
as.double(1-sum(pexp(-lt)))
}
h = Vectorize(h)
theoretic computation using hypoexponential distribution
g = function(t) {
generate matrix for hypo-exponential distribution
l = seq(2.01,2+0.01*k,0.01)
M = diag(-l)
diag(M[-k,-1]) = l[-k]
compute CDF using matrix exponential
1-sum(expm(t*M)[1,])
}
g = Vectorize(g)
simulation of summing 20 exponential distributed variables
sim = function() {
l = seq(2.01,2+0.01*k,0.01)
sum(rexp(k,l))
}
n = 10^3
y = replicate(n, sim())
y = y[order(y)]
ps = c(1:n)/n
plot results
layout(matrix(1:4,1))
plot(y,ps, type = "l", ylim = c(0,1),
xlab = "y", ylab = "p(X<=x)", col = 1, lty = 1,
main = "estimate using \n beta distribution")
lines(xs, pb, col = 2, lty = 2)
plot(y,ps, type = "l", ylim = c(0,1),
xlab = "y", ylab = "p(X<=x)", col = 1, lty = 1,
main = "estimate using \n hypo exponential distribution")
lines(xs, g(xs), col = 2, lty = 2)
plot(y,ps, type = "l", ylim = c(0,1),
xlab = "y", ylab = "p(X<=x)", col = 1, lty = 1,
main = "estimate using \n sum of exponential functions")
lines(xs, h(xs), col = 2, lty = 2)
plot(y,ps, type = "l", ylim = c(0,1),
xlab = "y", ylab = "p(X<=x)", col = 1, lty = 1,
main = "estimate using \n sum of exponential functions \n high precision")
lines(xs, h(xs, precision = 106), col = 2, lty = 2)
Method 3: Higher precision
The code above has your formula included as well. When $n = 20$ and we compute $P(X\leq 0)$ then we get the sum of numbers in the order around $10^{30}$:
> p*exp(-l*t)
20 'mpfr' numbers of precision 200 bits
[1] 1184377704791549923796136956.607426574485340044851136466504687
[2] -22391774527715875503095258875.03263232345792580141668597954522
[3] 200533231977287736716622162734.3838677372979407569824559548119
[4] -1130784613649661547175898714626.269008683529273781772412307854
[5] 4501074364576350873051048293822.423602640254497752902071819979
[6] -13437673467060135092418439694475.71911050588558126590977544721
[7] 31203100063608817375763652531058.33729082993747843780866417281
[8] -57670015296133360609431486185964.71343501537022605441586568593
[9] 86091123312888841776163054615053.45436911041487039497542827845
[10] -104721424601233110811921244247537.9049963527831300886304307068
[11] 104225114532032942398284597644451.965294335372683402639022954
[12] -84872852699970611190163244498094.3346219654358556203530876458
[13] 56316259068524588993874802018151.08761670825436672186290382369
[14] -30182437911995447023357731135651.16278839986713601836568813731
[15] 12875166205648313573257565589226.63372958778154884796895991358
[16] -4271852984898851750101954219568.873097418541709901194303528335
[17] 1063041756610741552892528679905.931481962365234315486381160982
[18] -186735073813712898818348384333.2996752734412911578062357660226
[19] 20653600249308703228634568900.71910148094711050404923349682836
[20] -1082090539377734264859901133.234415028798941489661772836725247
> sum(p*exp(-l*t))
1 'mpfr' number of precision 200 bits
[1] 1.000000000000000000000000000065346803013058912494189331591907
>
So that answers your question 'why you do not get results in the range between 0 and 1'. It is because the computation depends on adding several very large positive and negative numbers which need to cancel each other (Whuber already mentioned this in the comments).
By using quadruple floating-point numbers we can compute it more accurately (but it is not a very fast computation).
- 77,915

Rimplementation of the formula.f <- Vectorize(function(t, lambda) { 1 - sum(apply(outer(lambda, lambda, function(x,y) ifelse(x==y, 1, y * exp(-x * t)/(y-x))), 1, prod)) }, "t"). As an example of its use (in a case where it works!) seecurve(f(x, seq_len(6) * 1e-2 + 2), 0, 3, lwd=2)Changing6to20gives the case in the question and clearly reveals the problem. There is a clash of conventions between this formula and your convention for $\lambda,$ btw. – whuber Mar 04 '22 at 16:11