imbalanced learning: precision vs recall trade-off

Question

Working on a multi-class problem (five classes) for which the dataset is highly imbalanced (two classes with less than 2% samples).

Which metric between precision and recall should I pay more attention to?

print(classification_report)
              precision    recall  f1-score   support
 Class 0       0.24      0.01      0.02     12826
 Class 1       0.00      0.00      0.00      1380
 Class 2       0.00      0.00      0.00      6543
 Class 3       0.51      0.98      0.67     22856
 Class 4       0.00      0.00      0.00      1561

accuracy                           0.50     45166

macro avg       0.15      0.20      0.14     45166
weighted avg       0.33      0.50      0.34     45166

I understand, but one can compute these metrics in a multi-classification problem as in one-against-others, for example, the the classification_report from my current work in the question edit above. — super_ask, Aug 18 '20 at 15:01

score 0 · Answer 1 · answered Aug 18 '20 at 15:14

Neither one. You should aim for well-calibrated and sharp probabilistic predictions of class membership. (Note that "unbalanced classes" cease to be a problem in this setting.) Once you have these predictions, you can choose actions to apply to each instance based on your predictions and the costs of wrong actions. This may involve a threshold, but note that the threshold pertains to the decision aspect, not the statistical part of the exercise, and requires a notion of cost or utility.

More here. And here.

imbalanced learning: precision vs recall trade-off

1 Answers1