2

Suppose you create a classifier for a dataset where there is a big disparity between the two classes, e.g., in fraud detection most of your data will be non-fraudulent. It's known that the standard accuracy metric can be very misleading in such cases.

Question: Is there an accuracy metric that is adjusted to the imbalance between classes and that exactly coincides with the standard accuracy metric when the classes are perfectly balanced?

jonem
  • 151
  • It makes much more sense to separate the statistical modeling aspect from the subsequent decision/action step: first obtain calibrated probabilistic classifications for your instances, and only then decide on actions for each instance, based on the probabilistic predictions and the cost of your decisions and the true class membership of each instance. This may well mean that you take more than one action (do nothing/collect more information/call the police), even if there are only two classes. See here, and links in my answer. – Stephan Kolassa Sep 29 '22 at 06:18
  • For one, a metric as you are requesting and gunes proposes may have the desired properties - but it does not capture the differential costs of "misclassifications" (I don't like the term, since, as above, there may well be more actions than classes). These costs will be highly asymmetric in fraud detection even in the balanced case, so requiring balanced accuracy to coincide with accuracy in the balanced case will lead you (and your model) astray. – Stephan Kolassa Sep 29 '22 at 06:22

1 Answers1

1

Yes, there is such a metric and it's called balanced accuracy. It's the arithmetic mean of true positive and negative rates (TPR and TNR(. Here is some intuition on it. The balancing logic explained in the above post can be easily extended to multi class case. Or, you can still use TPR and TNR of each class to determine it (an example here).

gunes
  • 57,205