1

I have the data on thousands of emission lines such as the one shown in the figure below. A single emission line covers $N$ pixels (11 in this example). Because the data come from counting photons, $X_i\sim\mathcal{Poisson}(\mu_i)$ where $\mu_i$ is the observed number of photons. Since $\mu_i$ is large: $X_i\approx\mathcal{N}(\mu_i,\mu_i)$. To simplify matters, I assume each $X_i$ to be independent from others. I can elaborate further if necessary.

Data

I want to normalise the data by the factor $\sum_{i=1}^{N} X_i$: $$ Y_i = \frac{X_i}{\sum_{i=1}^{N} X_i} $$ and determine the variance on $Y_i$. Any help is greatly appreciated.

A similar problem was discussed previously. There, $X_i\sim\mathcal{N}(0,\sigma^2)$, i.e. all $X_i$ are drawn from the same distribution, which is not the case here.

  • 1
    You will have the slight issue of what to do with $\sum X_i =0$ (i.e. unluckily not seeing any photons), which theoretically would have small positive probability, but since $Y_i$ is designed to be in $[0,1]$, you can sensibly allocate it an arbitrary value in that range such as $0$ or $1$ or $\frac 1N$ and it will not make much difference – Henry Aug 24 '23 at 10:48
  • Hi Henry. I am focusing only on spectral regions with strong emission lines. Regions for which $\sum X_i=0$ are not considered at all. – dmilakov Aug 24 '23 at 11:15
  • 1
    Re-interpreting your question, consider $X_1$, which from your chart seems to be about $12000$ and the total $\sum X_i \approx 1500000$ giving the proportion $Y_1$ about $\frac{12000}{1500000}=0.008$. That has a (binomial) variance of about $0.008\times (1-0.008) / 1500000 \approx 5\times 10^{-9}$ i.e. a standard error about $0.00007$. But I am not sure this is the calculation you are aiming for. – Henry Aug 24 '23 at 12:11
  • Nice idea (@Henry) to condition on the huge denominator. – Ute Aug 24 '23 at 12:42
  • The numbers you used are about an order of magnitudes smaller than the ones in the data but, taking the numbers from your calculation results in a signal-to-noise ratio (S/N) of 115 which is around the (S/N) in the data.

    Plugging in the numbers in the data but using the same formula gives S/N=347, where it is 345 in the data so this seems correct.

    Would you mind explaining to me why $Y_i$ has a binomial variance?

    – dmilakov Aug 24 '23 at 13:30
  • Binomial: consider two independent random Poisson variables $X$ and $Z$ with means $\mu_X$ and $\mu_Z$. Then the conditional distribution of $X$ given the sum $X+Z=n$ is binomial Bin(n,p), with $p=\mu_X / (\mu_X+\mu_Z)$. In the present case, $X=X_1$ and $Z=X_2+\dots+X_N$. – Ute Aug 24 '23 at 15:01
  • Thanks, Ute, that makes sense. The variance of a binomial distribution $Bin(n,p)=np(1-p)$. $p$ is given by your formula and corresponds to 0.008 in Henry's comment. I presume $n=\sum(data)$, his 1500000. Taking all of this together, I don't understand why Henry divided by $n$, though. I seem to be missing something. – dmilakov Aug 24 '23 at 18:47
  • Sorry to be pinging like this, but I would greatly appreciate closing this discussion. While I now understand why the conditional distribution is binomial, I still do not understand why Henry's formula for the variance of that distribution is different from what I find in textbooks. Thanks. – dmilakov Aug 26 '23 at 10:37
  • Also, implementing Henry's formula as it is, I more often than not get a higher S/N on the rescaled quantity than the S/N in the data. This seems unlikely. I also did an MCMC calculation which exhibits larger variances than the formula predicts but in agreement with the variance calculated from $\sigma(Y_i) = \sigma(X_i)/\sum(X_i)$ (within a few percent). – dmilakov Aug 26 '23 at 11:09

0 Answers0