4

Variance can be combined as

$$v=\frac{1}{n-1}\left(\sum_{i = 1}^{numGroups}n_{i}(m_{i}-m)^2+ \sum_{i = 1}^{numGroups}(n_{i}-1)v_{i}\right)$$

where $v$ is the combined variance, $n$ is the total sample size, $n_i$ is the number of points in group $i$, $numGroups$ is the total number of groups, $m_i$ is the mean of group $i$, $m$ is the combined mean, $v_i$ is the variance of the $i^{th}$ group

Is there a name for this formula or any reference to it?

Dilip Sarwate
  • 46,658
Budhapest
  • 581

2 Answers2

7

Let $x_{i,j}$ denote the $j$-th data point in the $i$-th group which has $n_i$ data points. There are $N$ such groups and thus a total of $\sum_{i=1}^N n_i = n$ data points.

If the sample mean and sample variance of the $i$-th group are $m_i$ and $v_i$ respectively, then we have $$n_i\cdot m_i = \sum_{j=1}^{n_i} x_{i,j}\quad \text{and} \quad (n_i-1)v_i = \sum_{j=1}^{n_i} \left(x_{i,j} - m_i\right)^2.$$ It follows that $\displaystyle \sum_{i=1}^N \sum_{j=1}^{n_i} x_{i,j} = \sum_{i=1}^N n_i\cdot m_i = n\cdot m$ where $m$ is the overall mean of the $n$ data points. Similarly, the sum $\displaystyle \sum_{i=1}^N (n_i-1)v_i = \sum_{i=1}^N \sum_{j=1}^{n_i}\left(x_{i,j} - m_i\right)^2$ can be recognized as the sum of the squared deviations of the data points from the means of their respective groups. This is not quite what we want for calculating the variance of the $n$ data points: we need to know the sum of the squared deviations from $m$. Fortunately, all that is needed is a little algebra. We have that $$\begin{align} \sum_{i=1}^N\sum_{j=1}^{n_i} \left(x_{i,j} - m\right)^2 &= \sum_{i=1}^N \left[\sum_{j=1}^{n_i}\left(x_{i,j}^2 -2x_{i,j}m + m^2\right)\right]\\ &= \sum_{i=1}^N \left[\left(\sum_{j=1}^{n_i}x_{i,j}^2\right) -2n_im_im + n_im^2\right]\\ &= \sum_{i=1}^N \left[\left(\sum_{j=1}^{n_i}x_{i,j}^2\right) + n_i(m^2 -2m_im + m_i^2) - n_im_i^2\right]\\ &=\sum_{i=1}^N \left[n_i(m_i-m)^2 + \sum_{j=1}^{n_i}\left(x_{i,j}^2-m_i^2\right) \right]\\ &= \sum_{i=1}^N \left[n_i(m_i-m)^2 + \sum_{j=1}^{n_i}\left(x_{i,j}^2-2x_{i,j}m_i + m_i^2\right) \right]\\ &= \sum_{i=1}^N \left[n_i(m_i-m)^2 + \sum_{j=1}^{n_i}\left(x_{i,j}-m_i\right)^2 \right]\\ &= \sum_{i=1}^N \left[n_i(m_i-m)^2 + (n_i-1)v_i \right]. \end{align}$$ All that remains is to divide both sides by $n-1$ and we are done.

Dilip Sarwate
  • 46,658
0

In the particular case when $N=2$, the formula can be rewritten: since $$\require{cancel} m=\frac{n_1m_1+n_2m_2}{n_1+n_2} $$ we have that $$ (m_1-m)^2 = \left(\frac{(\cancel{n_1m_1}+n_2m_1) - (\cancel{n_1m_1}+n_2m_2)}{n_1+n_2}\right)^2 = \left(\frac{n_2}{n_1+n_2}\right)^2 (m_1-m_2)^2. $$ Doing the same for $(m_2-m)^2$ and combining everything together, we obtain that the first term of the summation for $v$ is $$ \begin{align} \sum_{i=1}^2 n_i(m_i - m)^2 &= \frac{n_1n_2^2}{(n_1+n_2)^2} (m_1-m_2)^2 + \frac{n_1^2n_2}{(n_1+n_2)^2} (m_1-m_2)^2 \\ &= \frac{n_1n_2}{(n_1+n_2)\cancel{^2}}(m_1-m_2)^2\cancel{(n_1+n_2)}. \end{align} $$ Therefore $$ v = \frac{1}{n-1} \left( \frac{n_1n_2}{n}(m_1-m_2)^2 + (n_1-1)v_1 + (n_2-1)v_2 \right) $$

Rackbox
  • 101