2

For a multiclass imbalanced problem, accuracy is not a good metric to evaluate model performance. Equally, accuracy is a global metric, so nothing like accuracy per-class (doesn't make sense).

Scikit-learn provides the classification_report function so one can evaluate model's precision/recall per class, e.g:

classification_report(y_true, y_pred, target_names=target_names)
              precision    recall  f1-score   support
 Class:0      0.703     0.896     0.788      4491
 Class:1      0.048     0.147     0.072        75
 Class:2      0.368     0.503     0.425      1097
 Class:3      0.937     0.850     0.892     17162
 Class:4      0.529     0.177     0.265       311

accuracy                          0.832     23136

macro avg 0.517 0.515 0.488 23136 weighted avg 0.856 0.832 0.838 23136

Are there other metrics that evaluate per-class so I can evaluate my model across more metrics than precision/recall/f1? The goal is to assess the model on a per-class basis.

arilwan
  • 273

1 Answers1

2

Precision and recall are misleading just like accuracy is. Every criticism against accuracy at Why is accuracy not the best measure for assessing classification models? applies equally to precision and recall.

Use probabilistic predictions for class membership, and evaluate these using proper , e.g., the Brier score or the log score. More information and pointers to literature can be found at the tag wiki.

If you really want to, you can calculate the average of such scores over instances that fall into particular classes, but the properness of these scores refers to their overall performance, and I suspect that looking at scores per class may be misleading.

You may be interested in this earlier answer of mine.

Stephan Kolassa
  • 123,354