1

I am working with an application where I have a grid of cells, and I calculate "concentrations" in each cell by randomly placing a number of "particles" across the grid, and counting the number of particles in each cell. I am then interested in finding the number of cells with concentration less than some threshold concentration.

In the simplest case, I have $N_p$ particles and $N_c$ cells, and the particles are placed randomly with equal probability of being placed in any cell. Then the number of particles in the different cells can be drawn from a multinomial distribution, with $N_p$ trials and $N_c$ different outcomes, all with probability $1/N_c$. Using python I can for example create a realisation like this:

C = np.random.multinomial(Np, pvals=np.ones(Nc)/Nc)

where C is then an array with Nc elements that sum to Np. In my application, each element of C represents the concentration in a cell.

I am then interested in the number of cells with concentrations less than some threshold concentration $C_{lim}$, as a function of $C_{lim}$. It is of course easy to find the result numerically, and I have also found that I can approximate the result quite well with the cumulative distribution of a Gaussian with mean $N_p/N_c$ and variance $N_p p (1-p)$. This makes sense to me as the concentration in each cell is a random variable with mean $N_p/N_c$ and variance $N_p p (1-p)$, and I guess the Central Limit Theorem might be relevant somehow. An example is shown in the figure below.

Plot showing realisation of random concentrations and cumulative distribution

Now we finally get to my question: Can I find a similar analytical approximation for the number of cells with concentration $C < C_{lim}$ in the case where particles are not distributed with the same probability for each cell? As an example, I have created the figure below, where the particle positions are still drawn from a multinomial distribution, but with higher probability of being placed in the cells in the center. In the example below I have used a Gaussian PDF to calculate the probabilities of the different cells based on their distance from the center (and then the probabilities were normalised to sum to 1), but I am also interested in the more general case where I don't have a nice analytical expression for the probabilities (but they can for example be evaluated numerically from a simulation).

Plot showing realisation of random concentrations and cumulative distribution

Any answers, hints, or suggestions of relevant literature are most welcome!

Edit: I found an answer that works, which I posted below, but I would still be happy if anyone has suggestions for relevant literature (books or papers), or other more rigorously presented solutions.

Tor
  • 467

1 Answers1

0

After a bit of numerical experimentation, I think I found an answer. The CDF for each cell, $i$, is the Gaussian CDF with mean $\mu = p_i N_p$ and variance $\sigma^2 = N_p p_i (1-p_i)$. If I add these together for each cell, and normalise by dividing by the number of cells, I get a pretty good match to the numerical result.

So the answer, with a bit of sloppy notation, is something like this:

$$ CDF_{tot} (C) = \frac{1}{N_c} \sum_{i=1}^{N_c} CDF_{Gauss} \big(C, \, \mu=N_p p_i, \, \sigma^2 = N_p p_i (1-p_i) \big). $$

A couple of examples for different distributions shown in the figure below.

Plot showing distribution of particles across cells, and the CDF

Plot showing distribution of particles across cells, and the CDF

Plot showing distribution of particles across cells, and the CDF

Tor
  • 467