0

I'm working on a binary classification problem and I'm trying to assess the performance of my model to correctly classify the positive class (hopefully getting a probability). I'm using the positive predictive value (PPV) statistic. The issue with the PPV is that I can move the classification threshold to the far right (on the score axis) and get a very high PPV, but also a very high miss ratio (low sensitivity). However, if I form the product of PPV * sensitivity then I can search for the classification threshold that maximizes that product. This product seems to be an excellent statistic on the performance of the model on the positive class but I can't find any reference on it. I need something that I and others can interpret. In my opinion, sensitivity * precision is saying something like (say in the field of cancer detection) "the fraction of positive diagnosis detected that are truly positive." Technically, the probability of a positive diagnosis given it was detected, given it was diagnosed positive by the model. I've generated a bunch of fake datasets and confusion matrices, and adjusted the classification threshold around and this sensitivity * precision can explain how well the model is doing on the positive class. So my questions are:

  • Does PPV * sensitivity make any sense as a statistic?
  • If this product makes sense, what is it saying about the model's performance? Does it have a name?
  • Is it a probability (PPV is a probability while sensitivity seems to be a ratio)?

Thanks.

iXombi
  • 1
  • Threshold-based metrics are generally discouraged. Why do you not use the log loss or Brier score to evaluate the probability outputs of your model? – Dave Jul 14 '21 at 17:25
  • I guess because I want to know what are the model's chances of truly detecting the positive class; not to know its performance overall. – iXombi Jul 14 '21 at 17:57
  • Have you read Frank Harrell's blog posts (1 2) about classification vs probability estimation? That you're concerned about tradeoffs between false positives and false negatives tells me that you regard those errors differently. – Dave Jul 14 '21 at 18:00
  • The model will not truly detect the positive class. All it will do is to give you a predicted probability for an instance of being in the positive class. You can assess whether these predicted probabilities are well-calibrated using proper scoring rules. In addition (and this is a separate topic), you can take decisions and assess the costs of wrong decisions. See here. – Stephan Kolassa Jul 14 '21 at 20:36
  • Further to your point, as Dave notes, sensitivity and precision are highly controversial. I would go so far to call them useless, and worse, actively misleading. See Why is accuracy not the best measure for assessing classification models? Every criticism against accuracy at that thread applies equally to sensitivity and precision. A fortiori, the product of two useless and misleading measurements will also be useless to misleading. – Stephan Kolassa Jul 14 '21 at 20:38

0 Answers0