Classification accuracy on multi classes

Question

My target has 5 classes. My testing dataset has an accuracy of about 34%. Can I assume this is a reasonable model purely based on classification accuracy, since random guessing is 20%.

Typically, standards for "satisfactory performance" are shaped outside statistics. They are determined by the context of your problem and traditions in your research field... Will your classification method make a positive impact in your field? Will the improvement relative to random guessing generate sufficient improvement in welfare? Are you sure that you cannot raise the accuracy (correct classification rate) even further? — stans, Aug 18 '18 at 13:32
Also make sure your classes are balanced (i.e. have the same number of samples in each class). Imagine one of the five classes has 50% of the samples. By predicting just this class, a dummy classifier could achieve an accuracy of 50%. — Djib2011, Aug 18 '18 at 13:46
Very very very relevant: Why is accuracy not the best measure for assessing classification models? — Stephan Kolassa, Aug 18 '18 at 18:30
@Djib2011 No! The relative class frequencies are part of the learning task so artificially balancing them may make the performance estimate irrelevant for the practical application. The point about considering the relative frequencies is correct though, the accuracy of the "just guessing" strategy is the relative frequency of the most common class, which may be higher than one divided by the number of classes. — Dikran Marsupial, Aug 28 '23 at 11:07
@DikranMarsupial I wasn't talking about artificially balancing the classes. I was trying to say that accuracy as a metric only works if the classes are balanced. — Djib2011, Aug 29 '23 at 08:17
@Djib2011 O.K., however that isn't true either. Accuracy is a useful metric for unbalanced problems, provided you understand what it tells you. A competent practitioner will consider the accuracy of the random classifier and use it as a baseline (or a more competitive one). There is always Cohen's Kappa statistic (which is basically telling you the proportion of error above the prior probability that is explained by the classifier) but that is just an affine transformation of the accuracy. — Dikran Marsupial, Aug 29 '23 at 08:33
See my answer to the question Stephan referred to ( https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models/538524#538524 ) The appropriate performance metric depends on the needs of the application. Sometimes that is accuracy even for unbalance classes. — Dikran Marsupial, Aug 29 '23 at 08:34

score 2 · Answer 1 · answered Aug 18 '18 at 13:54

Not necessarily, that assumes that the 5 classes are equally distributed. Consider the case of imbalanced data where 90% of the data is class 1 and the rest of the data is split between classes 2-5. If we were to predict every single observation as being class 1 we could expect accuracy of 90%! This is clearly not a useful model demonstrating the importance of looking at other metrics of performance such as precision, recall, ROC etc. Accuracy can still be considered but the minimum baseline should be performing better than the majority class.

Edit: Just noticed Djib2011 beat me to the same point in the comments.

Dikran Marsupial · Answer 2 · 2023-08-29T10:03:08.137

Just to add slightly to @SerafFej's answer. A competent practitioner will take the relative class frequencies into account when assessing the accuracy. However, a general audience may not appreciate that subtlety, so it is incumbent on the practitioner not to simply present the accuracy, but put it into context by stating the accuracy of the baseline classifier that just predicts the most common class, ignoring the attributes. Alternatively, you could use something like Cohen's kappa statistic instead:

$\kappa = \frac{Acc - Acc_{random}}{1 - Acc_{random}}$

This tells us the proportion of the possible improvement in accuracy that we have obtained by using the attributes. So if the classifier is only guessing, $\kappa$ will be zero and if the classifier is perfect, $\kappa$ will be 1. Which makes the "added value" of the classifier much more apparent to a general audience. This statistic was being widely used in the neural network community back in the 1990s, but not under the same name (just an obvious solution to the problem).

Note however it is just an affine transformation of the accuracy, so it isn't telling us anything that accuracies do not already tell us. Ranking classifiers by accuracy and by $\kappa$ will give the same ranking.

Can I assume this is a reasonable model purely based on classification accuracy

The other problem here is that we can't tell whether the model is reasonable if we don't know how separable (easy) the classification task is (i.e. the accuracy of the Bayes optimal classifier). If we have a very easy classification problem where all of the classes are well separated from each other 34% may be an extremely poor level of performance. On the other hand, if there is a substantial overlap between classes, then it may be that it is very difficulty to improve on a classifier that just picks the most common class all the time. See my question here which gives an example where the "default classifier" that ignores the attributes entirely is optimal.

Unfortunately if we had a means of knowing the Bayes error rate, we probably would have no need to construct a classifier from data, so this end of the scale is not so easy to establish.

+1 My questions and self-answers here and here get into the idea of comparing model accuracy to some kind of naïve baseline like $\kappa$ does. I’ve struggled to decide what I think the denominator should be when there are $3+$ categories, and I’m not sold on permuting the true categories like Cohen’s $\kappa$ does when it comes to this particular problem, but if you’re going to assess the accuracy, something like $\kappa$ gives important context. — Dave, Aug 29 '23 at 10:09
I think in this case I would just use the probability of the most common class as $Acc_{random}$. It is a bit like Cohen's kappa, but not exactly the same, I think it was via the discussions (+1) with you that I found out it had a name! — Dikran Marsupial, Aug 29 '23 at 10:17

Classification accuracy on multi classes

2 Answers2