In general, I deal with imbalanced datasets in multiclass classification problems. Now I'm facing a multiclass classification problem with balanced data. In this context, are macro precision, recall, and f-measure more informative than accuracy?
Asked
Active
Viewed 52 times
1
-
What do you want to be informed about? – Dave Mar 28 '23 at 12:26
-
1Sensitivity, specificity etc. and any weighted combinations of these suffer from all the same issues as accuracy, i.e., they all presume a very specific cost structure to decisions in the face of uncertainty - but they do not make the costs explicit. Better to work with probabilistic classifications and separate the decision aspect from them. Decisions need to take classifications and costs into account, and even if there are only two classes, there may well be more than two possible decisions – Stephan Kolassa Mar 28 '23 at 12:32
-
And that is true whether the classes are balanced or not! – Dave Mar 28 '23 at 12:32
-
... and you know all that, because I commented at your earlier thread. Did you look at the thread on accuracy? How do you operationalize "more informative", i.e., by what measure would any of these metrics be "more informative" than accuracy? – Stephan Kolassa Mar 28 '23 at 12:33
-
My question is simple. I'm just not sure if those metrics provide different values than accuracy in this setting that can be more useful than just accuracy. – Zaratruta Mar 28 '23 at 12:43
-
What is "macro" in your question? You should explain it – ttnphns Mar 28 '23 at 12:52
-
They will provide different values. But whether they will be "more useful" depends on how you will use the predictions (that you optimized for either accuracy, F1, or something else). Which ties exactly back into my comment above: since all of these metrics presuppose an implicit (!!!) cost structure, whether they will lead you to better decisions (again, see above) depends on whether this cost structure just happens to be close to the costs you actually face. Again: better to explicitly separate modeling from deciding. – Stephan Kolassa Mar 28 '23 at 12:53
-
1@StephanKolassa, you are 100% true but might be irrelevant to the question. Sometimes there is "no" modelling, we are not presented with probabilities, but only with classification decision results. The OP did not consider any modelling particulars. – ttnphns Mar 28 '23 at 12:58
-
You may try to answer your question by meditation upon their formulae https://stats.stackexchange.com/q/586342/3277. For example, recall that Accuracy is the Rand index and F is the Dice index. So, knowing how these binary similarity measures behave differentially, might give a clue. – ttnphns Mar 28 '23 at 13:17
-
2@ttnphns: you make a good point. I would say that it then is even more important to think about what it means for one metric to be more useful than the other. – Stephan Kolassa Mar 28 '23 at 13:22
-
@ttnphns, macro is a version of f1, precision and recall used for multiclass classification problems. It is important to have in mind that the classical definitions of those measures are for binary classification. https://towardsdatascience.com/micro-macro-weighted-averages-of-f1-score-clearly-explained-b603420b292f – Zaratruta Mar 28 '23 at 13:22
-
@ttnphns Yes. I know that. In my question I emphasize the multiclass nature of my problem. Is due to it that I'm using averages (in particular, macro average) – Zaratruta Mar 28 '23 at 22:30