12

I have to prove that the sample variance is an unbiased estimator. What is is asked exactly is to show that following estimator of the sample variance is unbiased:

$s^2=\frac{1}{n-1}\sum\limits_{i=1}^n(x_i-\bar x)^2$

I already tried to find the answer myself, however I did not manage to find a complete proof.

Andreas Dibiasi
  • 322
  • 1
  • 3
  • 8
  • 3
    Please post what you have accomplished so far -and add the self-study /homework tag. – Alecos Papadopoulos Mar 15 '15 at 15:16
  • 1
    @AlecosPapadopoulos Is the homework tag really a thing? I've been removing those where I found them, as I didn't see a value in it. – FooBar Mar 15 '15 at 18:22
  • @FooBar I am not sure this is a good idea. Our meta-threads indicate a rather strong opinion in favor of explicitly acknowledging homework questions as such, in the tags. – Alecos Papadopoulos Mar 15 '15 at 19:35
  • @AlecosPapadopoulos could you link to me that discussion? I only found a question without answers: http://meta.economics.stackexchange.com/questions/1252/how-to-ask-a-homework-question – FooBar Mar 15 '15 at 19:46
  • @Foobar http://meta.economics.stackexchange.com/questions/24/how-should-we-deal-with-homework-questions – Alecos Papadopoulos Mar 15 '15 at 20:21
  • @AlecosPapadopoulos I remembered that one, actually. But none of these answers referred to "explicitly acknowledging homework questions as such", especially use of tags. – FooBar Mar 15 '15 at 21:01
  • @FooBar "Acknowledgement" is mentioned in the question, which had 19 upvotes and zero downvotes. The use of tags speeds up things -you don't have to read the whole question in order to find somewhere in there that the OP reveals that this is homework. – Alecos Papadopoulos Mar 15 '15 at 21:39
  • What I am trying to do is to prepare a handout for some students I tutor, as this preparation is a good opportunity to repeat my econometrics skills I guess a ''self-study'' tag should be correct. – Andreas Dibiasi Mar 16 '15 at 12:03
  • @AlecosPapadopoulos, since you are the top user, this is a question for you (it could be on Meta instead, but what the hell). Obviously, the OP is purely statistical and would be on topic on Cross Validated but mostly off topic here. I wonder why no one indicated this before and did not try moving it to there. – Richard Hardy Jan 17 '17 at 18:15
  • @RichardHardy If I am not mistaken, I have already posted an answer in this thread, isn't it so?:) Of course there exists the more sophisticated one using the annihilator matrix... As per on-topicness, since in undergraduate econometrics the basic Linear Regression model still looms large (with its unbiased OLS estimator), I believe it can be tolerated here too. – Alecos Papadopoulos Jan 17 '17 at 18:47
  • @AlecosPapadopoulos, yes, my question was about on-topicness. I understand statistics is used in economics extensively, but since we have a dedicated statistical site, why not focus the statistical expertise there rather than scatter it around on many sites. So while the OP might be on topic here, it is "way more" on topic at Cross Validated. And when areas overlap, my hunch is that we should go for the better fit. (This must have been discussed on Meta, but I don't remember the relevant thread, so I might also be wrong.) – Richard Hardy Jan 17 '17 at 19:29
  • @RichardHardy Don't trust R-squared that much! – Alecos Papadopoulos Jan 17 '17 at 19:35

3 Answers3

20

For a shorter proof, here are a few things we need to know before we start:

$X_1, X_2 , ..., X_n$ are independent observations from a population with mean $\mu$ and variance $\sigma^{2}$

$\mathbb E(X_i) = \mu$ , $\mathbb{Var}(X_i)= \sigma^{2}$

$\mathbb E(X^2) = \sigma^{2} + \mu^{2}$

$\mathbb{Var}(X)=\mathbb E(X^2)-\mathbb [E(X)]^2$

$\mathbb E(\bar{X}^2) = \frac{\sigma^2}{n} + \mu^2$


Let's try to show that $\mathbb E(s^2)= \mathbb E\left(\frac{\sum_{i=1}^n (X_i - \bar X)^2}{n-1}\right) = \sigma^{2}$

To make my life easier, I will omit the limits of summation from now onwards, but let it be known that we are always summing from $1$ to $n$.

$\mathbb E\left(\sum (X_i - \bar X)^2 \right) = \mathbb E\left(\sum X_{i}^2 - 2 \bar X \sum X_i + n \bar X^2 \right) = \sum \mathbb E(X_{i}^2) - \mathbb E\left(n \bar X^2 \right)$

$\sum \mathbb E(X_{i}^2) - \mathbb E\left(n \bar X^2 \right) = \sum \mathbb E(X_{i}^2) - n \mathbb E\left(\bar X^2\right) = n \sigma^2 + n \mu^2 - \sigma^2 -n \mu^2$

This simplifies to $(n-1) \sigma^2$

So far, we have shown that $\mathbb E\left(\sum (X_i - \bar X)^2 \right) = (n-1)\sigma^2$

$\mathbb E(s^2)= \mathbb E\left(\frac{\sum (X_i - \bar X)^2}{n-1}\right) = \frac{1}{n-1} \mathbb E\left(\sum (X_i - \bar X)^2 \right)$

$\mathbb E(s^2) = \frac {(n-1)\sigma^2}{n-1} = \sigma^2$

We have now shown that the sample variance is an unbiased estimator of the population variance.

Five σ
  • 591
  • 2
  • 12
9

I know that during my university time I had similar problems to find a complete proof, which shows exactly step by step why the estimator of the sample variance is unbiased.

The proof I used can be found under http://economictheoryblog.wordpress.com/2012/06/28/latexlatexs2/

The proof itself is not very complicated but rather long. That also the reason why I am not writing it down here and probably it is not fair towards the person who actually provided it in the first place.

Thomas Drew
  • 106
  • 4
4

Let's improve the "answers per question" metric of the site, by providing a variant of @FiveSigma 's answer that uses visibly the i.i.d. assumption (showing also its necessity).

We want to prove the unbiasedness of the sample-variance estimator, $$s^2 \equiv \frac{1}{n-1}\sum\limits_{i=1}^n(x_i-\bar x)^2$$

using an i.i.d. sample of size $n$, from a distribution having variance $\sigma^2$,

$$E(s^2) =?\; \sigma^2$$

First, write

$$s^2 \equiv \frac{n}{n-1} \frac{1}{n}\sum_{i=1}^n(x_i-\bar x)^2$$

Then

$$\frac{1}{n}\sum_{i=1}^n(x_i-\bar x)^2 = \frac 1n \left(\sum_{n=1}^n(x_i^2- 2\bar x x_i + \bar x^2)\right) = \frac 1n \sum_{n=1}^nx_i^2- 2\bar x \frac 1n \sum_{n=1}^nx_i + \bar x^2$$

Since $\bar x = \frac 1n \sum_{n=1}^nx_i$ we get

$$\frac{1}{n}\sum_{i=1}^n(x_i-\bar x)^2 =\frac 1n \sum_{n=1}^nx_i^2- \bar x^2$$

We consider the expected value of the two components

$$E\left(\frac 1n \sum_{n=1}^nx_i^2\right) = \frac 1n \sum_{n=1}^nE(x_i^2)=E(X^2)$$

since the variables are identically distributed.

Also

$$\bar x ^2 = \left(\frac 1n \sum_{n=1}^nx_i\right)^2 = \frac 1{n^2}\left(\sum_{n=1}^nx_i^2 + \sum_{i\neq j}x_ix_j\right)$$

the second sum having $n^2-n$ elements. So $$E(\bar x^2) = \frac 1{n^2}(nE(X^2)) + \frac 1{n^2}\left[(n^2-n)E(x_i)E(x_j)\right]$$

We were able to write $E(x_ix_j) = E(x_i)E(x_j)$ because the sample is comprised of independent RVs. More over they are identical so $E(x_i)E(x_j) = [E(X)]^2$. Therefore

$$E(\bar x^2) = \frac 1nE(X^2) + \frac {n-1}{n}[E(X)]^2$$

Bringing it all together,

$$E(s^2) = \frac {n}{n-1}\cdot \left[E(X^2) - \frac 1nE(X^2) - \frac {n-1}{n}[E(X)]^2\right]$$

$$= \frac {n}{n-1}\cdot \left[\frac {n-1}{n}E(X^2) - \frac {n-1}{n}[E(X)]^2\right]$$

$$\implies E(s^2) = E(X^2) - [E(X)]^2 \equiv {\rm Var}(X)$$

Alecos Papadopoulos
  • 33,814
  • 1
  • 48
  • 116