1

Be patient, I am not very skilled with cdf.

I seem to have a seemingly simple problem for which I either can't seem to find material about or simply lack the vocabulary for.

Given are N sequential events with their own unique independent cdf that describe the probability of an event taking X minutes or less. If I want to calculate the probability that all N events can be completed sequentially in less than Y minutes, what would I need to use?

I feel like a joint cdf will quickly get out of hand given an increasing N. Is there maybe a way to combine the input parameters of my distributions in such a way that I get a new cdf that describes the sum of the distributions? If it helps, I'm using Gamma/Erlang distributions only at the moment.

Thanks a lot for your input, and sorry if this is a duplicate.

  • Apply the definition of independence. (This question does not concern the sum at all, which is odd. Are you perhaps thinking of there being an order to these events with each event not possibly occurring until all its predecessors have occurred?) – whuber Apr 11 '22 at 19:30
  • Do the events happen simultaneously, or sequentially? If they're simultaneous then you can use independence as whuber suggests. If they're sequential and all events have the same scale/rate parameter then you can use the properties of the Gamma/Erlang distribution to easily find the distribution of the sum. If they have different rate parameters then I'm not sure. – David Thiessen Apr 11 '22 at 19:56
  • @whuber yes, I'm talking about tasks being performed sequentially. Let's say vaccuuming the house, then emptying the dishwasher, than folding the laundry. What's the probability of doing all that sequentially taking less than Y minutes if I have the cdf and/or pdf of these independent processes. (I have fixed the question to make it a bit clearer, sorry for my wrong use of terms.) – Pepijn Ekelmans Apr 12 '22 at 08:05
  • @DavidLukeThiessen unfortunately they all have different parameters, based on my own measured data. – Pepijn Ekelmans Apr 12 '22 at 08:06

1 Answers1

1

The question "what is the probability that all events will take less than Y minutes?" is answered by the cdf at Y of the sum of the random variables. For independent Gamma distributions with different parameters, this is addressed in a couple of papers, most prominently "The distribution of the sum of independent gamma random variables" (P. G. Moschopoulos, 1985) . (Erlang distributions are a special case of the Gamma distribution, so the results should be applicable in this case.) The paper presents steps for computing an infinite series whose sum converges to the density function of the sum of RVs; to obtain the cdf you'll probably need to integrate numerically. The paper also provides error bounds to choose the number of series elements to compute in practice.

Alternatively, I would suggest, at least at first, to use a Monte Carlo method, and simply simulate many trials and use the empiric distribution of those trials. This can be done e.g. using numpy's RNG, and might give accurate enough results.

Edited to add sample code for a simulation:

import numpy as np

rng = np.random.default_rng()

Say we have N=100 distributions, here I'm randomizing them

N = 100 k = rng.random(N) * 100 theta = rng.random(N) * 10

Sample M trials, in this case 100000

M = 100000 X = rng.gamma(k, theta, size=(M, N))

Calculate total duration of sequential events for each trial

y = X.sum(axis=1)

MAX_MINUTES = 23000

Approximate CDF of sum at MAX_MINUTES

print((y <= MAX_MINUTES).mean())

This runs in approximately 2 seconds on my machine. Try different values of M to choose a tradeoff between runtime and accuracy.

  • 1
    +1. Our thread about sums of Gammas provides additional commentary on part of this answer. A Monte Carlo simulation is usually best preceded by a little analysis. After all, the expectation and variance of the sum are easily found, giving a handle on the likely range of the sum and suggesting how large the simulation must be to achieve any desired level of precision. – whuber Apr 12 '22 at 11:22