2

Let $X_1,X_2,\ldots$ be an infinite list of independent normal variables $X_i\sim\mathcal N(0,1)$, for $i=0,1,\dots$. Consider the the sum $$Y=\sum_ir^iX_i^2,$$ where $r\in(0,1)$ is a parameter to weight (geometrically) each term in the sum. By definition, for each random variable, we have $E[X_i^2]=1$. It's straightforward to see that: $$E[Y]=\sum_ir^iE[X_i^2]=(1-r)^{-1}.$$

I can also compute the variance of $Y$. We have $E[X_i^2X_j^2]=1$ if $i\neq j$ and $E[X_i^4]=3$: \begin{align} E[Y^2]&=\sum_{i\neq j}r^ir^j+3\sum_ir_i^2\\ &=\sum_{i,j}r^ir^j+2\sum_ir_i^2\\ &=(1-r)^{-2}+2(1-r^2)^{-1}. \end{align}

Using $\mathrm{Var}[Y]=E[Y^2]-E[Y]^2$, we find: $$\mathrm{Var}[Y]=2(1-r^2)^{-1}.$$

I am interested here to compute the explicit distribution of $Y$. According to the article on the chi-squared distribution in Wikipedia, there is no closed form for the (finite) sum of a linear combination of $\chi^2$ distributions.

However, I can make the assumption that $Y\sim a\chi^2(k)$ (or is at least close to) for a given $a$ and $k$ that I need to determine in some way. In practice, $r$ is close to $1$ so this assumption might be good enough for my purpose. Using basic properties of the $\chi^2$ distribution, I write the expectation as $E[Y]=ak$ and the variance as $\mathrm{Var}[Y]=2a^2k$.

Matching the first two moments of my real distribution and the $\chi^2$ approximation, I get: $$a=\frac1{1+r}\simeq\frac12$$ and: $$k=\frac{1+r}{1-r}\simeq\frac2{1-r},$$ which is likely a large value if $r$ is close to 1.

Does this type of approximation work in practice? If so, for which values of $r$ should I consider it safe? I am a bit concerned I get a factor of $2$ in the answers, I will need to check numerically if there is a mistake in my calculation.

utobi
  • 11,726

1 Answers1

0

This approximation (the Satterthwaite approximation) works very well for moderate tail probabilities. It works badly in the extreme right tail, because the extreme right tail of the true distribution is exponential (well, it is for finite sums; I'd be surprised if infinite sums were any easier)

You can do better without unreasonable effort in a couple of ways, if you don't absolutely need a closed-form solution.

  • Take the explicit convolution of the first few $\chi^2_1$ variables and a $a\chi^2_k$ approximation for the remainder. This gets you bounded relative error in the right tail, and helps a lot (we used it in genomics). You can use the R CompQuadForm package to do the convolutions numerically, if numerical results are good enough
  • There's a saddlepoint approximation that has very good accuracy in the extreme right tail (Kuonen, and the R survey package). This is the only straightforward way I know of to get good accuracy for right tail probabilities smaller than machine epsilon, and it's nearly closed form.
Thomas Lumley
  • 38,062