Does the numerical optimization of neural networks mean that class-imbalance really is a problem for them?

Question

A mantra on Cross Validated is that class imbalance in classification problems is not the self-evident problem that some believe it to be, that the apparent issues are resolved either through evaluating the full content of the predictions (e.g., evaluating the probabilistic outputs of a logistic regression instead of classification accuracy for a threshold like $0.5$) or by gathering more data so there are many instances of minority categories [1][2].

Neural networks in particular tend to be trained in small batches, and a rare category might have very few instances in these batches, perhaps even zero, as Björn recently speculated might be problematic.

Is class imbalance really a problem for the numerical optimization methods used to train neural networks? What remedies are there?

References to the literature would be appreciated.

"classification accuracy for a threshold like 0.5" is a proper statistical evaluation if that is an important consideration for the application. I give a concrete example here https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models/538524#538524 where a proper scoring rule selects the wrong model where classification accuracy defines the (financial) value of the model. — Dikran Marsupial, Jul 13 '23 at 21:55
This would seem useful: https://doi.org/10.1186/s40537-019-0192-5 — Björn, Jul 14 '23 at 12:32

score 4 · Answer 1 · answered Jul 13 '23 at 21:48

I suspect class imbalance can be an issue for multi-class problems. Say you have a three class problem, with two common classes and one rare class. The cost function may be minimised more by using hidden units to model features of the distribution of the common classes that doesn't actually affect the decision boundary than it is by modelling the rare class in a way that does affect the decision boundary. This is simply because the cost function is usually the sum of terms for each training pattern, and a lot of them will belong to the two common classes. Basically class imbalance may result in resource allocation failing to assign sufficient neurons to the minority class.

This is sort of the problem with proper scoring rules. If your application requires a decision, a proper scoring rule may reward a model that performs slightly better in an area of high data density away from the decision boundary, rather than a model that performs better at the decision boundary. This isn't to say that proper scoring rules are a bad idea, just that they are not a panacea, even if they are best practice for the majority of applications.

I'm not sure the batch issue is a substantial problem as different batches are used in each epoch, and some of those by random sampling will have a slightly more than average number of minority patterns.

Regarding the last paragraph, would it be a problem to have zero or just a few instances of a category? Especially having just a few instances being problematic is consistent with what you’ve posted a number of times that class imbalance is more of an issue of the raw number of instances of the minority category than the proportion. — Dave, Aug 11 '23 at 07:32
@Dave no, in this instance I don't think it would matter because it is only one iteration of the learning algorithm and the updates are additive, so the overall effect is a decaying running average of the updates. If you made the learning rate too high (and didn't use momentum and other common heuristics) then it might be an issue, but in that case the network won't learn much anyway. The reason it matters in other models is that it is effectively a one-step learning algorithm (or one batch), so you don't get that averaging. — Dikran Marsupial, Aug 11 '23 at 08:00

Does the numerical optimization of neural networks mean that class-imbalance really is a problem for them?

1 Answers1

Linked