sampling distribution of sample variance (normal distribution)

Question

It is mentioned in Stats Textbook that for a random sample, of size n from a normal distribution , with known variance, the following statistic is having a chi-square distribution with n-1 degrees of freedom:

n * (sample Var)/ (Population Var)

I plotted both the sample Variance & the statistic above & the distributions seem identical. Does that mean the sample variance also has a chi square distribution with n-1 degrees of freedom? why can't we simply use the distribution of sample variance.

Below is the python code I used.

# %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(40,30))
sample_var = []
for i in range ( 0,10000):
   x = np.random.normal(loc=10, scale=3.0, size=5) # normal distribution with mean 10 & var = 9 ( std dev = 3)
   avg = np.mean(x)
   sample_var.append((np.sum((x -avg)*2))/4) # Sample variance
sample_var = np.array(sample_var)
chi_sq = 5/9 sample_var    # ( chi square statistic = n* sample var/population var)
ax1.hist(sample_var,50, color='b', edgecolor='black')
ax2.hist(chi_sq,50, color='r', edgecolor='black')
plt.show()

What happens when population variance is not known?

Thanks Kedar

Sergio · Accepted Answer · 2020-08-27T14:07:35.363

The distributions look identical just because they are vaguely similar, but they are not identical at all. You get their difference if you change the last three lines of your code:

ax1.hist(sample_var,50, density=True,color='b', edgecolor='black')
ax2.hist(chi_sq,50, density=True, color='r', edgecolor='black')
ax2.set_xlim(ax1.get_xlim())
ax1.set_ylim(ax2.get_ylim())
plt.show()

Let me say that the result you are referring to is: $$(n-1)S^2_n/\sigma^2=n\hat\sigma^2_n/\sigma^2=\frac{1}{\sigma^2}\sum_i(x_i-\overline{x})^2\sim\chi^2_{n-1}$$ where $\sigma^2$ is the population variance, $\hat\sigma^2_n=\frac1n\sum_i(x_i-\overline{x})^2$ is the sample variance and $S^2_n=\frac{1}{n-1}\sum_i(x_i-\overline{x})^2$ is the unbiased sample variance. (See https://mathworld.wolfram.com/SampleVariance.html)

The distribution of $\sum_i(x_i-\overline{x})^2/\sigma^2$ is $\chi^2_{n-1}=\text{Gamma}\left(\frac{n-1}{2},\frac12\right)$. (I'm using the rate parametrization.)

In general, if $X\sim\text{Gamma}(\alpha,\beta)$, then $aX\sim\text{Gamma}(\alpha,\beta/a)$, so the distributions of the biased/unbiased sample variances are $$\hat\sigma^2_n=\frac{\sigma^2}{n}\left(\frac{n\hat\sigma^2_n}{\sigma^2}\right)\sim\text{Gamma}\left(\frac{n-1}{2},\frac{n}{2\sigma^2}\right)$$ $$S^2_n=\frac{\sigma^2}{n-1}\left(\frac{(n-1)S^2_n}{\sigma^2}\right)\sim\text{Gamma}\left(\frac{n-1}{2},\frac{n-1}{2\sigma^2}\right)$$ Here are the density plots for $\chi^2_{n-1}$, $S^2_n$, and $\hat\sigma^2_n$ ($n=5$, $\mu=10$, $\sigma^2=9$ as in your code.)

Appreciate your effort Sergio. I will digest your response thoroughly before marking this as answered — Kedar_dg, Aug 27 '20 at 17:49
What I can conclude is : The distributions of sample variance ( biased and unbiased) ,and the ratio of sample variance to population variance , are all Gamma distributions . Of these distributions, the ratio distribution is of particular interest & called the chi-square distribution. For this statistic, the parameters of the gamma distribution correspond to the degree of freedom , without including any population variance term & can be generically calculated. Let me know if that understanding is correct. — Kedar_dg, Aug 28 '20 at 05:34
Right! For example, the Student's $t$ variable can be defined as $U/\sqrt{V/n}$, where $U\sim N(0,1)$ and $V\sim\chi^2_n$. You can then define a $T$ statistic which depends on $n$, but not $\mu$ or $\sigma$. See https://en.wikipedia.org/wiki/Student%27s_t-distribution — Sergio, Aug 28 '20 at 06:56

sampling distribution of sample variance (normal distribution)

1 Answers1

Linked