3

Assume we have a normal distribution with known mean $\mu$. How can we estimate the variance by sampling? The typical answer to this question is to use the unbiased sample variance estimator i.e. if the data points are indicated by $x_1, \ldots x_n$ the following: $$\frac{(x_1-\bar{x})^2+\ldots +(x_n-\bar{x})^2}{n-1}$$ Where $\bar{x}$ is the sample mean. Now can we use the actual mean in any meaningful way to get a better estimator of the variance? The first thing that comes to mind would be to replace $\bar{x}$ by $\mu$ and divide it by $n$ instead of $n-1$ (to keep it unbiased). Is this a better estimator? why?

user127776
  • 207
  • 2
  • 5

2 Answers2

2

Let's check if $\hat\sigma^2_{you}=\dfrac{\sum_{i = 1}^n (x_i - \mu)^2}{n-1}$ is unbiased for $\sigma^2$.

$$ \mathbb E\Bigg[ \dfrac{\sum_{i = 1}^n (x_i - \mu)^2}{n-1} \Bigg] \\ = \dfrac{1}{n-1} \mathbb E\Bigg[ \sum_{i = 1}^n (x_i - \mu)^2 \Bigg] \\= \dfrac{1}{n-1} \sum_{i = 1}^n\mathbb E\Bigg[ (x_i - \mu)^2 \Bigg] \\ = \dfrac{1}{n-1} \sum_{i = 1}^n\mathbb E\Bigg[ x_i^2 -2\mu x_i + \mu^2 \Bigg] \\= \dfrac{1}{n-1} \sum_{i = 1}^n\Bigg[\mathbb E\big[x_i^2\big] -2\mu\mathbb E\big[x_i\big]+\mu^2\Bigg]\\= \dfrac{1}{n-1} \sum_{i = 1}^n\Bigg[\mathbb E\big[x_i^2\big] -2\mu^2+\mu^2\Bigg]\\= \dfrac{1}{n-1} \sum_{i = 1}^n\Bigg[\mathbb E\big[x_i^2\big] -\mu^2\Bigg] $$

Now, $\mathbb E\big[x_i^2\big]=\mathbb E\big[x_i\big]^2 + \mathbb Var(x_i)$, so:

$$ \dfrac{1}{n-1} \sum_{i = 1}^n\Bigg[\mathbb E\big[x_i^2\big] -\mu^2\Bigg]\\= \dfrac{1}{n-1} \sum_{i = 1}^n\Bigg[\mathbb E\big[x_i\big]^2 + \mathbb Var(x_i) -\mu^2\Bigg] \\= \dfrac{1}{n-1} \sum_{i = 1}^n\Bigg[\mu^2 + \mathbb Var(x_i) -\mu^2\Bigg] \\= \dfrac{1}{n-1} \sum_{i = 1}^n\mathbb Var(x_i) \\= \dfrac{n}{n-1} \mathbb Var(x_i) $$

Consequently, $\hat\sigma^2_{you}$ is biased for $\sigma^2$!

However, if you redo that calculation with a $n$ denominator, you get an unbiased estimator for $\sigma^2$.

$$\hat\sigma^2_{unbiased}=\dfrac{\sum_{i = 1}^n (x_i - \mu)^2}{n}$$

Whether or not this $n$-denominator is the best estimator is a matter of opinion, but it is unbiased. For better or for worse, most choices about what estimator to use come down to a matter of opinion.

Dave
  • 62,186
1

Suppose you have a random sample of size $n$ from the population $\mathsf{Norm}(\mu, \sigma),$ where $\sigma$ is not known and $\mu$ is known.

Let $V = \frac 1n\sum_{i=1}^n (X_i - \mu)^2.$

Then $V$ is a better estimate of the population variance $\sigma^2$ than is $S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2,$ where $\bar X =\frac 1 n \sum_{i=1}^n X_i.$

Also, a 95% CI for $\sigma^2$ tends to be narrower if we use $V$ than if we use $S^2.$ [Samples can vary, so this CI is not always narrower.]

In particular, a 95% CI for $\sigma^2$ is based on the relationship $\frac{nV}{\sigma^2} \sim \mathsf{Chisq}(\nu = n).$

Example: Suppose I have the sample x of size $n = 50$ from $\mathsf{Norm}(\mu = 20, \sigma = 3),$ where I assume $\mu$ is known and $\sigma$ is not.

set.seed(215)
x = rnorm(50, 20, 3)
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  14.21   17.97   19.94   20.30   22.62   29.31

v = (sum((x-20)^2))/50; v [1] 10.69335

CI.1 = 50*v/qchisq(c(.975,.025), 50); CI.1 [1] 7.486223 16.523827 diff(CI.1) [1] 9.037604 # width of CI

The formula for this confidence interval is $\left(\frac{50V}{U}, \frac{50V}{L}\right),$ where $L$ and $U$ cut probabilities $0.025$ from the lower and upper tails, respectively, of $\mathsf{Chisq}(\nu=50).$ For the data of my example, the CI is $(7.49\, 16.52)$ of width $9.04.$

By contrast, the 95% CI for $\sigma^2$ based on $S^2,$ where $\mu$ is estimated by $\bar X,$ uses the relationship $\frac{(n-1)S^2}{\sigma^2}\sim\mathsf{Chisq}(\nu=49).$

CI.2 = 49*var(x)/qchisq(c(.975,.025), 49);  CI.2
[1]  7.548087 16.797538
diff(CI.2)
[1] 9.249451   # wider CI

For the data of my example, the CI is $(7.55,\, 16.80)$ of width $9.25 > 9.04.$

BruceET
  • 56,185