3

Does precision or recall have more importance? Or are they to be considered equivalent measures of accuracy?

They can produce different numbers; they refer to slightly different errors. But is it possible to say whether precision or recall would have more meaning as a measure of accuracy?

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
mavavilj
  • 4,109

2 Answers2

5

Precision and recall are neither equivalent, nor measures of overall accuracy.

Let $TP, TN, FP, FN$ denote the number of true positives, true negatives, false positives, and false negatives. Accuracy, precision, and recall are defined as:

  • Accuracy = $\frac{TP + TN}{TP + TN + FP + FN}$: The fraction of all cases that have been classified correctly.

  • Precision = $\frac{TP}{TP + FP}$: The faction of cases classified as positive that are actually positive. This is equivalent to accuracy on the subset of cases that have been classified as positive. It can be interpreted as measuring how informative a 'positive' answer from the classifier is about the true class.

  • Recall = $\frac{TP}{TP + FN}$: The fraction of positive cases that have been classified as positive. This is equivalent to accuracy on the subset of cases that are actually positive. It can be interpreted as measuring how completely the classifier identifies positive cases.

Each of these is a different way (among many other alternatives) of measuring the performance of a classifier. None can be said to be more or less meaningful than another; rather, they mean different things. Which performance measure(s) to use depends on the problem and goals. A classifier is used to guide decisions, and each action has different consequences in different circumstances. The performance measure should reflect how we care about these consequences.

user20160
  • 32,439
  • 3
  • 76
  • 112
1

Recall and Precision can be useful measures if you interpret them correctly, but they can also be misleading. Consider as a simple example a series of $300$ "coin flips". The $i^{th}$ coin flip is heads (positive) with probability $$p_i = \frac{1}{1+\exp(-x_i)}$$ where $x_i$ is an observable feature. A model based on logistic regression might predict that a coin will be heads if $x_i > 0$. The success of this model depends on the distribution of the $x_i$. Here I will simulate them from a $N(0, 0.1)$ distribution, so that the model does only slightly better than average.

Let's compare this to a model which always predicts heads (called the null model).

Here are the results from a single simulation:

Logistic Regression Model:

  • Accuracy: $0.53$
  • Precision: $0.49$
  • Recall: $0.53$
  • $F_1$ Measure: $0.51$

Null Classifier (always one)

  • Accuracy: $0.50$
  • Precision: $0.50$
  • Recall: $1.0$
  • $F_1$ Measure: $0.67$

Note that for the null model, Recall will always be $1.0$ which makes sense by the definition. In fact all metrics except for Accuracy choose the null model here.

Let's look at the difference in each measure over $10,000$ simulations.

enter image description here

Note that Accuracy and precision choose the logistic regression based model the majority of the time, but Recall and $F_1$ measure choose the null model every single time.

So as others have said, which measure has more meaning truly depends on the problem. In my opinion, I would want to use the model which incorporates the features! But if false negatives are a really big deal, Recall suggests that the "smart model" is too costly.

knrumsey
  • 7,722