While reading this blog on f-statistic in ANOVA, I stumbled upon the formula:
The f-statistic is calculated like this: $$\frac{\text {variance between groups}}{\text {variance within groups}}$$
So, I needed to understand what exactly "variance between groups" and "variance within groups" mean.
Enter the F-test. We are going to state that if there is no difference in the means then the estimate of variance you get from the difference in group means should be the same as the estimate of the population variance you get within groups.$^1$
(emphasis added)
Well, after reading this, one can get an idea about what these terms really mean. Moreover, the formulas (unsourced) representing these terms are also given. Namely:$$\text{within group variance}=\frac{(n_1-1)s_1^2+(n_2-1)s_2^2+(n_3-1)s_3^2}{N-g}$$
where, $n_i$ is the number of elements in the $i^{th}$ group.
$s_i$ is the variance in the $i^{th}$ group.
$N$ is the total number of subjects.$^2$
and, $g$ is the number of groups.
And, $$\text{between group variance}=\frac{\sum_{i=1}^g n_i(y_i-\bar y)^2}{g-1}$$
Where, $g$ is the total number of groups
$y_i$ is the group mean of the $i^{th}$ group.
And $\bar y$ is the overall mean.
But still, I do not clearly understand these terms. The main reason for this confusion may be due to these unsourced formulas.
NOTE:
- My main aim was to find how F-statistic compares two models in ANOVA.
- Secondly, I also wanted to know the reason for the fact that if the null hypotheses is true, then the"variance between groups" is equal to the "variance within groups".
Therefore I'm trying to understand the meaning of these terms.
$1$: The "no difference in the means" is the null hypotheses. The means are of three different groups.
$2$: Don't understand the "subjects" here.