1

This question on F-scores for classification problems (specifically for classification; f_classif) has two answers. Each answer shows a different way to calculate F-score, in particular differences in the numerator of the F-score. Both answers produce the same number/same value;

My question is: are the two answers mathematically equivalent?

To simplify things, the question asks about only two groups (blue group (which has $n$ instances) and red group (which has $m$ instances))

Technique A formula: Notice this technique compares group means to the total mean...

(Note the degrees of freedom part (dividing by $(p_{fit} - p_{mean})$) is implied; there are two groups, therefore $p_{fit} - p_{mean} = 2-1 = 1$; so it's division by 1.)

$$ n(\bar{x}_{blue}-\bar{x})^2 + m(\bar{x}_{red} - \bar{x})^2 $$

Technique B formula: Notice this technique compares $x_i$ values to the total mean, and $x_i$ values to the group means...

$$ ( \sum_{i=1}^{n_{all}} (x_i - \bar{x})^2 - ( \sum_{i=1}^{n} (x_i - \bar{x}_{blue})^2 +\sum_{i=1}^{m} (x_i - \bar{x}_{red})^2 ) ) $$

And this notebook shows my attempt to conclude the techniques are equivalent;

  • indeed both techniques produce the same number/same value (given the same data).
  • I try to show both formulas are mathematically equivalent (not my expertise!) - see the bottom of the notebook with the header Numerators are equivalent (or click here for a screenshot of my math...)

Other questions I've read, but I can't tell if they answer my question here...

1 Answers1

1

Yes, these are equivalent in this context. In brief, the first formula is the sums of squares for the between-group calculation; the second formula is the total sums of squares less the within-group sums of squares. Because $$SS_\text{between} + SS_\text{within}= SS_\text{total}$$ these formulas will give the same result.

Gregg H
  • 5,474