F-statistic for classification, two ways to calculate the numerator; are they the same?

Question

This question on F-scores for classification problems (specifically for classification; f_classif) has two answers. Each answer shows a different way to calculate F-score, in particular differences in the numerator of the F-score. Both answers produce the same number/same value;

My question is: are the two answers mathematically equivalent?

To simplify things, the question asks about only two groups (blue group (which has $n$ instances) and red group (which has $m$ instances))

Technique A answer (consistent with other sites/answers, like this one)
Technique B answer, which references the StatsQuest video

Technique A formula: Notice this technique compares group means to the total mean...

(Note the degrees of freedom part (dividing by $(p_{fit} - p_{mean})$) is implied; there are two groups, therefore $p_{fit} - p_{mean} = 2-1 = 1$; so it's division by 1.)

$$ n(\bar{x}_{blue}-\bar{x})^2 + m(\bar{x}_{red} - \bar{x})^2 $$

Technique B formula: Notice this technique compares $x_i$ values to the total mean, and $x_i$ values to the group means...

$$ ( \sum_{i=1}^{n_{all}} (x_i - \bar{x})^2 - ( \sum_{i=1}^{n} (x_i - \bar{x}_{blue})^2 +\sum_{i=1}^{m} (x_i - \bar{x}_{red})^2 ) ) $$

And this notebook shows my attempt to conclude the techniques are equivalent;

indeed both techniques produce the same number/same value (given the same data).
I try to show both formulas are mathematically equivalent (not my expertise!) - see the bottom of the notebook with the header Numerators are equivalent (or click here for a screenshot of my math...)

Other questions I've read, but I can't tell if they answer my question here...

Most similar asks about two versions of calculating F-statistic
Referenced by above, which seems to give the formula in Technique B

score 1 · Accepted Answer · answered Jul 01 '23 at 12:33

Yes, these are equivalent in this context. In brief, the first formula is the sums of squares for the between-group calculation; the second formula is the total sums of squares less the within-group sums of squares. Because $$SS_\text{between} + SS_\text{within}= SS_\text{total}$$ these formulas will give the same result.

F-statistic for classification, two ways to calculate the numerator; are they the same?

1 Answers1