ROC curve illustrates a perfect classificator however other metrics show worse values

Question

how is it to explain that ROC illustrates such a perfect classificator however other metrics represent something different?

The evalaution was done on CV of 5.

F1:  0.724770642201835
Precision:  0.797979797979798
Recall:  0.6638655462184874
false positive rate:  0.0007034451224873819
true positive rate: 0.6638655462184874
F1:  0.8571428571428572
Precision:  0.7878787878787878
Recall:  0.9397590361445783
false positive rate:  8.793064031092274e-05
true positive rate: 0.9397590361445783
F1:  0.711111111111111
Precision:  0.6530612244897959
Recall:  0.7804878048780488
false positive rate:  0.00031655030511932187
true positive rate: 0.7804878048780488
F1:  0.8409090909090908
Precision:  0.7551020408163265
Recall:  0.9487179487179487
false positive rate:  7.03445122487382e-05
true positive rate: 0.9487179487179487
F1:  0.7380952380952381
Precision:  0.6326530612244898
Recall:  0.8857142857142857
false positive rate:  0.0001406890244974764
true positive rate: 0.8857142857142857

score 2 · Accepted Answer · answered Sep 11 '22 at 13:16

In this case, there may be an issue with your code (as those precision recall figures don't appear anywhere on the curve), but in general:

The (area under the) receiver operating characteristic is a measure of how well the classifier ranks the data according to the plausibility of it belonging to the positive or negative class. The area under the ROC curve is the probability that a randomly selected positive pattern will have a higher rank than a randomly selected negative pattern. However this ranking doesn't depend on the threshold at which patterns are classified as belonging to the positive or negative class (or equivalently the calibration of the estimate of the probability of class membership). You could set the threshold to plus infinity, in which case all patterns are classified as negative, but that wouldn't change the ROC curve as it doesn't depend on the threshold.

The other metrics probably are sensitive to the thresholds, which is why they give a different picture, and if the AUC is perfect, but the other metrics give a bad result, that may mean the model is not estimating probabilities very well or the threshold is wrong (usually it doesn't take false-positive and false-negative costs into account). However, sometimes this is the optimal solution for the problem as stated, most commonly for problems with a high degree of class imbalance (see my question here).

I tend to use AUROC to measure the classifiers ability to rank patterns, the negative log-likelihood, discriminative information or Brier score to give an indication of the calibration of the probability estimates and the accuracy (possibly weighted according to misclassification costs) to estimate it's discriminative ability. Different metrics measure different things, so it is important to understand what they do, and to work out which metrics are appropriate according to the nature of the problem you are trying to solve.

my dataset is highly imbalanced, going to read the linked link about this issue. Thank you for answering, need some time to understand it fully. — malocho, Sep 11 '22 at 13:25

ROC curve illustrates a perfect classificator however other metrics show worse values

1 Answers1