This question on F-scores for classification problems (specifically for classification; f_classif) has two answers. Each answer shows a different way to calculate F-score, in particular differences in the numerator of the F-score. Both answers produce the same number/same value;
My question is: are the two answers mathematically equivalent?
To simplify things, the question asks about only two groups (blue group (which has $n$ instances) and red group (which has $m$ instances))
- Technique A answer (consistent with other sites/answers, like this one)
- Technique B answer, which references the StatsQuest video
Technique A formula: Notice this technique compares group means to the total mean...
(Note the degrees of freedom part (dividing by $(p_{fit} - p_{mean})$) is implied; there are two groups, therefore $p_{fit} - p_{mean} = 2-1 = 1$; so it's division by 1.)
$$ n(\bar{x}_{blue}-\bar{x})^2 + m(\bar{x}_{red} - \bar{x})^2 $$
Technique B formula: Notice this technique compares $x_i$ values to the total mean, and $x_i$ values to the group means...
$$ ( \sum_{i=1}^{n_{all}} (x_i - \bar{x})^2 - ( \sum_{i=1}^{n} (x_i - \bar{x}_{blue})^2 +\sum_{i=1}^{m} (x_i - \bar{x}_{red})^2 ) ) $$
And this notebook shows my attempt to conclude the techniques are equivalent;
- indeed both techniques produce the same number/same value (given the same data).
- I try to show both formulas are mathematically equivalent (not my expertise!) - see the bottom of the notebook with the header Numerators are equivalent (or click here for a screenshot of my math...)
Other questions I've read, but I can't tell if they answer my question here...
- Most similar asks about two versions of calculating F-statistic
- Referenced by above, which seems to give the formula in Technique B