Sum versus mean of loss function in neural networks

Question

For training a neural network is there any significant difference between using the Sum Squared error or the Mean squared error as the loss function?

Algebraically, it obviously doesn't change the argmin. Sometimes that's not the only consideration, though. — Glen_b, Oct 09 '21 at 04:00
no big difference, but regularisation hyperparameters have to be rescaled appropriately ( if searching for the hyperparameter then should find rescaled param anyway) — seanv507, May 28 '23 at 00:08

score 0 · Answer 1 · answered May 27 '23 at 17:59

According to the theory of optimization calculus, these are equivalent: monotonic transformations like dividing by a constant do not change the $\arg\min$ that you aim to find (that $\arg\min$ gives the network weight and bias values).

However, to do any empirical work, we need to use computers to approximate these optimizations, and the sum and mean do not behave quite the same way. For example, this answer discusses why the mean might be preferred to the sum for numerical reasons when it comes to doing the calculations on a computer like you have to do. Thus, despite the theoretical guarantees of the sum and mean having the same $\arg\min$, the numerical considerations matter for doing applied work.

Sum versus mean of loss function in neural networks

1 Answers1