A mantra on Cross Validated is that class imbalance in classification problems is not the self-evident problem that some believe it to be, that the apparent issues are resolved either through evaluating the full content of the predictions (e.g., evaluating the probabilistic outputs of a logistic regression instead of classification accuracy for a threshold like $0.5$) or by gathering more data so there are many instances of minority categories [1][2].
Neural networks in particular tend to be trained in small batches, and a rare category might have very few instances in these batches, perhaps even zero, as Björn recently speculated might be problematic.
Is class imbalance really a problem for the numerical optimization methods used to train neural networks? What remedies are there?
References to the literature would be appreciated.