1

I am working on a classification problem. Several models are produced and all have accuracy, precision and recall metrics on test data. I need to pick the best model among the alternatives. What I can think of immediately is to combine precision and recall using F1-measure and use this as a decision metric to pick the best model.

However the requirement I am given is that accuracy should also be part of the decision metric or I should prove that combining F1-measure and accuracy will not improve the decision metric. Does anybody have any idea how to do either?

Roger V.
  • 3,903

2 Answers2

4

This approach is questionable. Instead use establish statistical principles as detailed here. Here are a few.

  1. Turn the problem into a prediction problem instead of a classification problem, by estimating outcome probabilities. This allows for close calls, gray zones, etc., and does not require artificial "data amputation" by balancing. Imbalance is not a problem.
  2. Use the gold standard log likelihood or penalized log likelihood (or Bayesian posterior) measure which captures much more information than the discontinuous measures you mentioned.
  3. In addition to using the log likelihood as the optimality criterion, unbiasedly and flexibly estimate the calibration curve to show that the predicted risks have good absolute accuracy. This allows the predictions to be used for decision making.
Frank Harrell
  • 91,879
  • 6
  • 178
  • 397
0

Although my manager's original request was to combine accuracy, precision and recall, I was able to sell him the idea that what we really want to do is to combine all four quadrants of the confusion matrix in a balanced way. What I mean by the balanced is that the metric should work under extreme class imbalances. After some research, I came up with two candidate metrics: Balanced accuracy and Matthews' Correlation Coefficient (MCC). They both work when there is class imbalance. Unlike these metrics accuracy fails whenever there is class imbalance; precision and recall fails when there are more positives than negatives(assuming that they are equally important to you).

  • 1
    What about proper scoring rules that evaluate the predicted probability values? – Dave Oct 22 '22 at 14:57
  • @Dave Could you elaborate on what you mean? – Emin Ozkan Oct 23 '22 at 15:23
  • https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ – Dave Oct 23 '22 at 16:00
  • 1
    From the business perspective, it's dangerous to consider the four quadrants of the confusion matrix this way unless you also consider the costs/benefits of making true and false class assignments of each type. See this page and its linked pages, for example. – EdM Oct 27 '22 at 15:05