1

There are many questions and answers on why $n-1$ is used instead of $n$ in sample vs population variance.

However, I wonder why the population variance definition even has $n$ in the first place. The definition is given as

$$\sigma^2 = \frac{\sum_{i=1}^n(x_i - \mu)^2}{n}.$$

What is the intuitive purpose of the $n$? Is it just a normalizing term? If so, why was a different normalizing term not chosen?

z611
  • 255
  • 1
    Variance is $\mathbb E[X-\mathbb EX]^2.$ If $X$ is discrete with $X\sim{p_i},$ then the variance becomes $\sum p_i(x_i-\mu) ^2.$ When $p_i =\frac1n,$ it becomes the formula of the OP. – User1865345 Sep 17 '22 at 17:35
  • @a6623, put is simply, it is averaging. We want that each realization of $X$ contribute to the result. Besides, we want to have value of $\sigma$ of the same magnitude regardless of $n$, which can be achived via such division. $\sigma$ is some number after all. Drop $n$ and the sum of positive numbers will increase with each new $i$. – garej Sep 17 '22 at 17:41
  • Notionally, variance is average squared distance from the mean. – Glen_b Sep 19 '22 at 00:08

1 Answers1

1

As said in the comments, variance is defined as

$$ \mathbb E[X-\mathbb EX]^2 $$

Expected value of a random value can be estimated from the sample by using the arithmetic average. Variance is just an average squared deviation of a random variable from its mean. We use average instead of sum because otherwise, it would grow with sample size and be harder to interpret.

But why do we care about average squared deviation? This is nicely answered in Why square the difference instead of taking the absolute value in standard deviation?, but also in Why is the squared difference so commonly used?, and What makes mean square error so good?.

See also Understanding "variance" intuitively.

Tim
  • 138,066