"Cumulative distribution" of a collection of random variables with different distributions

Question

I am working with an application where I have a grid of cells, and I calculate "concentrations" in each cell by randomly placing a number of "particles" across the grid, and counting the number of particles in each cell. I am then interested in finding the number of cells with concentration less than some threshold concentration.

In the simplest case, I have $N_p$ particles and $N_c$ cells, and the particles are placed randomly with equal probability of being placed in any cell. Then the number of particles in the different cells can be drawn from a multinomial distribution, with $N_p$ trials and $N_c$ different outcomes, all with probability $1/N_c$. Using python I can for example create a realisation like this:

C = np.random.multinomial(Np, pvals=np.ones(Nc)/Nc)

where C is then an array with Nc elements that sum to Np. In my application, each element of C represents the concentration in a cell.

I am then interested in the number of cells with concentrations less than some threshold concentration $C_{lim}$, as a function of $C_{lim}$. It is of course easy to find the result numerically, and I have also found that I can approximate the result quite well with the cumulative distribution of a Gaussian with mean $N_p/N_c$ and variance $N_p p (1-p)$. This makes sense to me as the concentration in each cell is a random variable with mean $N_p/N_c$ and variance $N_p p (1-p)$, and I guess the Central Limit Theorem might be relevant somehow. An example is shown in the figure below.

Now we finally get to my question: Can I find a similar analytical approximation for the number of cells with concentration $C < C_{lim}$ in the case where particles are not distributed with the same probability for each cell? As an example, I have created the figure below, where the particle positions are still drawn from a multinomial distribution, but with higher probability of being placed in the cells in the center. In the example below I have used a Gaussian PDF to calculate the probabilities of the different cells based on their distance from the center (and then the probabilities were normalised to sum to 1), but I am also interested in the more general case where I don't have a nice analytical expression for the probabilities (but they can for example be evaluated numerically from a simulation).

Any answers, hints, or suggestions of relevant literature are most welcome!

Edit: I found an answer that works, which I posted below, but I would still be happy if anyone has suggestions for relevant literature (books or papers), or other more rigorously presented solutions.

Tor · Accepted Answer · 2022-02-27T11:18:36.160

After a bit of numerical experimentation, I think I found an answer. The CDF for each cell, $i$, is the Gaussian CDF with mean $\mu = p_i N_p$ and variance $\sigma^2 = N_p p_i (1-p_i)$. If I add these together for each cell, and normalise by dividing by the number of cells, I get a pretty good match to the numerical result.

So the answer, with a bit of sloppy notation, is something like this:

$$ CDF_{tot} (C) = \frac{1}{N_c} \sum_{i=1}^{N_c} CDF_{Gauss} \big(C, \, \mu=N_p p_i, \, \sigma^2 = N_p p_i (1-p_i) \big). $$

A couple of examples for different distributions shown in the figure below.

"Cumulative distribution" of a collection of random variables with different distributions

1 Answers1