5

Let $y_1, y_2, \ldots, y_n$ and $z_1, z_2, \ldots, z_n$ be samples of size $n$ of a normal distribution $\mathcal{N}(0,1)$. My goal is to find the distribution of $$\frac{\sum_{i=1}^n (y_i - z_i)^2}{\sum_{i=1}^n(y_i - \bar{y})^2},$$ where $\bar{y} = (y_1 + y_2 + \cdots + y_n) / n$.

My first idea was to approximate the denominator of the expression by $n$, since $$\frac{1}{n}\sum_{i=1}^n(y_i - \bar{y})^2\approx 1.$$ In other words, we approximate the expression by the Mean Squared Error between the samples.

$(y_i - z_i)^2$ follows a $\Gamma(1/2, 4)$ distribution, since it is the square of a $\mathcal{N}(0, 2)$. Then, the distribution of the whole thing is a $\Gamma(n/2, 4/n)$. This seems to be working well enough as an approximation, but the first assumption is not correct, I believe. Using Python, I have produced some simulations which seem to confirm my suspicion:

Plot of the simulation

Then, I tried expressing the whole distribution as a quotient of distributions. The numerator has distribution $\Gamma(n/2, 4)$, but I am not sure about the denominator. I know that the quotient of $\Gamma$ distributions is known so, if the denominator follows a $\Gamma$ distribution, we would be done. However, I do not know how to show this. Any hints? Thank you in advance.

Edit: by a bit of educated trial and error, and using simulation, it looks like the solution is a $\beta'(n, n-1, 1, 2)$ distribution. The way I got to this is by taking the quotient of $\Gamma(n/2,4)$ (the distribution of the numerator) and $\Gamma((n-1)/2, 2)$ (what I think is the distribution of the denominator). However, taking the quotient of these two results in a $\beta'(n/2, (n-1)/2, 1, 2)$ distribution, which is not the correct solution (again, by simulation). Here is what my simulation results look like ($n=30$, $10000$ trials):

Simulation results with beta prime

Am I computing the quotient incorrectly?

Ray Bern
  • 151
  • 2
    If you could explain the statistical motivation for this question it might give us some insight into solutions -- and help us verify that this really is the question you need answered. – whuber Jun 01 '22 at 14:50
  • The thing I care about is the distribution of the right side of the expression in the statement. Whether it really is $1-R^2$ or not does not really matter, to me. – Ray Bern Jun 01 '22 at 14:51
  • 2
    Then this becomes purely a math problem: you ask about the quotient of two homogeneous quadratic forms. The solution will be a matter of linear algebra to find a suitable decomposition of those forms. – whuber Jun 01 '22 at 15:00
  • 2
    If $R^2$ is R squared from a linear regression then some results on its distribution are available at https://stats.stackexchange.com/questions/130069/what-is-the-distribution-of-r2-in-linear-regression-under-the-null-hypothesis, viz a beta distribution. From the properties listed at https://en.wikipedia.org/wiki/Beta_distribution, $1-R^2$ then is also beta, but with the parameters flipped. – Christoph Hanck Jun 01 '22 at 15:09
  • I have removed the $R^2$ part of the question, as it was confusing. – Ray Bern Jun 01 '22 at 15:11
  • As to the denominator, https://stats.stackexchange.com/questions/121662/why-is-the-sampling-distribution-of-variance-a-chi-squared-distribution#:~:text=The%20sampling%20distribution%20of%20the,of%20interest%20is%20normally%20distributed). – Christoph Hanck Jun 01 '22 at 15:15
  • 4
    @RayBern The $\beta'$ distribution you use is true for the ratio of independent gamma variables, in your case the numerator and denominator are clearly not independent. It's interesting that the actual distribution seems to have the similar form that you found – J. Delaney Jun 01 '22 at 17:09
  • @J.Delaney absolutely right. It is interesting indeed, it seems to be working with any value of $n$ I pick – Ray Bern Jun 01 '22 at 20:01
  • 2
    The $\beta^\prime$ distribution fails when you try this with small $n$. It is especially clear when $n = 2$. – Sextus Empiricus Jun 01 '22 at 20:24

0 Answers0