4

If the AUC score is 100 percent can the F1 value be 99.94 percent? I would expect 100 percent, too.

Dave
  • 62,186
Peter
  • 221
  • 1
  • 7

1 Answers1

8

$AUC$ measures the separability of the probability outputs of your model. If the positive group's lowest probability of being positive is less than the negatives group's highest probability of being positive, then you will achieve $AUC=1$.

However, calculating $F_1$ requires you to apply a particular threshold (and only that threshold). Software usually picks that threshold to be a probability of $0.5$. The two groups need not be separable at $0.5$. It could be that every probability value (across both groups) exceeds $0.5$ or is lower than $0.5$.

Consequently, there should not be any expectation that threshold-based metrics be perfect when $AUC=1$.

If $=1$, then any threshold between the highest value for the negative group and the lowest value for the positive group results in classifying everything correctly, so all threshold-based metrics (at such a threshold) should be perfect (on the data set that generated the $=1$, not necessarily in general if you use additional data).

Dave
  • 62,186
  • 1
    But of course, it should be possible to pick a threshold such that the F1 score is 1 right? – Sextus Empiricus May 12 '22 at 17:26
  • 1
    @SextusEmpiricus If $AUC=1$, then any threshold between the highest value for the negative group and the lowest value for the positive group results in classifying everything correctly, so all threshold-based metrics (at such a threshold) should be perfect (on the data set that generated the $AUC=1$). – Dave May 12 '22 at 18:11
  • What can I do that F1 goes along with AUC? – Peter May 13 '22 at 06:43
  • @Peter What do you mean? – Dave May 13 '22 at 11:49
  • @Dave Can I use more test samples or just change the random state? – Peter May 13 '22 at 17:39
  • 1
    @Peter To accomplish what? To make a model with a certain $AUC$ score higher on $F_1$ than a competitor that has a lower $AUC?$ Those don’t have to move together. They are different metrics for a reason, and they don’t even evaluate same model, since $F_1$ requires a threshold to be chosen. – Dave May 13 '22 at 17:45