Is it possible to have recall and precision of 0 while having an area under PR ~0.5?

Question

As the title suggests, I am running a Random Forest classifier using Scala. To evaluate this classifier (and since I am handling highly imbalanced classes), I used the BinaryClassificationEvaluator library. The area under PR is >0.5 but when I print the confusion matrix, it looks like my recall and precision are 0 (I have 0 TP predictions).

Is this mathematically possible?

The confusion matrix just looks at one threshold, but the PR curve looks at all thresholds. Suppose your classifier gives predictions 0.48 to all negative and 0.49 to all positives. Using the classification rule "If prediction > 0.5, positive else negative", you'll have 0 TP predictions. Something similar is happening here. — Sycorax, Apr 17 '19 at 18:07
I completely understand what you mean..That was my first thought as well but I wasnt so sure. Will wait a little longer and accept answer after I investigate a bit further. — Toutsos, Apr 17 '19 at 18:59
Related: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?, as well as Why is accuracy not the best measure for assessing classification models? - everything said there about accuracy applies equally to TPR, FPR etc. — Stephan Kolassa, Apr 17 '19 at 20:27
@Sycorax: do you want to post your comment(s) as an answer? Better to have a short answer than no answer at all. Anyone who has a better answer can post it. — Stephan Kolassa, Apr 17 '19 at 20:28

score 3 · Accepted Answer · answered Apr 17 '19 at 21:42

The confusion matrix just looks at one threshold, but the PR curve looks at all thresholds. Suppose your classifier gives predictions 0.48 to all negative and 0.49 to all positives. Using the classification rule "If prediction > 0.5, positive else negative", you'll have 0 TP predictions. Something similar is happening here.

Is it possible to have recall and precision of 0 while having an area under PR ~0.5?

1 Answers1