2

Let's set aside what we know about proper scoring rules and predicting probabilities; let's do CLASSIFICATION.

Define sensitivity as the ability to call an observation a $1$ if it really is a $1$: $ \text{sensitivity} = P(\hat{y} = 1 \vert y = 1) $.

Define specificity as the ability to call an observation a $0$ if it really is a $0$: $ \text{specificity} = P(\hat{y} = 0 \vert y = 0) $.

Once we get a classification, however, these values become less important. If we get a prediction of $\hat{y}=1$, we care about $P(y=1 \vert \hat{y} = 1)$, the reverse conditioning of sensitivity. Ditto for a prediction of $\hat{y}=0$ and specificity. In concrete terms, we care about the probability of having coronavirus, given that we tested positive (or the probability of not having it, given a negative test).

In the past few days when I have been fiddling with these, I have been referring to $P(y = 1 \vert \hat{y} = 1)$ and $P(y = 0 \vert \hat{y} = 0)$ as posterior sensitivity and posterior specificity, respectively.

Do they have established names? Are they used much in machine learning? If not, why not?

Dave
  • 62,186

1 Answers1

3

You are right to not be interested in probabilities that are backwards in terms of time-order and information flow. The correct terminology for the quantities you are interested in is predictive value positive and predictive value negative. But using these probabilities is discarding a great deal of information, and it is often not a good idea to have classification as a goal. Instead, estimate $P(y=1 | X)$ where $X$ retains the full information in the predictors, including continuous values. Do away with positive and negative and allow for gray zones. More information may be found here and here.

Frank Harrell
  • 91,879
  • 6
  • 178
  • 397
  • Although classification should be resisted as a goal to retain information content, when it is preformed in some ways sensitivity is inferior to predictive value positive (also known as precision) as they have, respectively, backward and forward information flow. With the same holding for specificity (backward) and predictive value negative (forward). Is this a fair summary of the above answer (checking I have understood everything)? – Single Malt Dec 28 '20 at 16:01
  • 1
    I think you have it. – Frank Harrell Dec 29 '20 at 22:14