0

how is it to explain that ROC illustrates such a perfect classificator however other metrics represent something different?

The evalaution was done on CV of 5.

enter image description here

F1:  0.724770642201835
Precision:  0.797979797979798
Recall:  0.6638655462184874
false positive rate:  0.0007034451224873819
true positive rate: 0.6638655462184874

F1: 0.8571428571428572 Precision: 0.7878787878787878 Recall: 0.9397590361445783 false positive rate: 8.793064031092274e-05 true positive rate: 0.9397590361445783

F1: 0.711111111111111 Precision: 0.6530612244897959 Recall: 0.7804878048780488 false positive rate: 0.00031655030511932187 true positive rate: 0.7804878048780488

F1: 0.8409090909090908 Precision: 0.7551020408163265 Recall: 0.9487179487179487 false positive rate: 7.03445122487382e-05 true positive rate: 0.9487179487179487

F1: 0.7380952380952381 Precision: 0.6326530612244898 Recall: 0.8857142857142857 false positive rate: 0.0001406890244974764 true positive rate: 0.8857142857142857

malocho
  • 316
  • 3
  • 10

1 Answers1

2

In this case, there may be an issue with your code (as those precision recall figures don't appear anywhere on the curve), but in general:

The (area under the) receiver operating characteristic is a measure of how well the classifier ranks the data according to the plausibility of it belonging to the positive or negative class. The area under the ROC curve is the probability that a randomly selected positive pattern will have a higher rank than a randomly selected negative pattern. However this ranking doesn't depend on the threshold at which patterns are classified as belonging to the positive or negative class (or equivalently the calibration of the estimate of the probability of class membership). You could set the threshold to plus infinity, in which case all patterns are classified as negative, but that wouldn't change the ROC curve as it doesn't depend on the threshold.

The other metrics probably are sensitive to the thresholds, which is why they give a different picture, and if the AUC is perfect, but the other metrics give a bad result, that may mean the model is not estimating probabilities very well or the threshold is wrong (usually it doesn't take false-positive and false-negative costs into account). However, sometimes this is the optimal solution for the problem as stated, most commonly for problems with a high degree of class imbalance (see my question here).

I tend to use AUROC to measure the classifiers ability to rank patterns, the negative log-likelihood, discriminative information or Brier score to give an indication of the calibration of the probability estimates and the accuracy (possibly weighted according to misclassification costs) to estimate it's discriminative ability. Different metrics measure different things, so it is important to understand what they do, and to work out which metrics are appropriate according to the nature of the problem you are trying to solve.

Dikran Marsupial
  • 54,432
  • 9
  • 139
  • 204
  • my dataset is highly imbalanced, going to read the linked link about this issue. Thank you for answering, need some time to understand it fully. – malocho Sep 11 '22 at 13:25