Negative Log Likelihood (NLL) reserved for a classification in PyTorch is weird?

Question

I'm curious as to why the Negative Log Likelihood (NLL) loss is used for classification tasks in PyTorch (see here). The negative log likelihood is a much more general notion than a measurement of error in a classification problem.

Yes the negative log likelihood of a Categorical distribution can be minimized (with respect to some parameters) to do maximum likelihood estimation, but it is not reserved for the Categorical distribution.

The negative log likelihood is a function we can determine for any distribution. For example, we can also use the negative log likelihood of a Gaussian distribution to then minimize, effectively doing maximum likelihood, in a simple regression problem.

Is there any reason PyTorch decided to do this? Is the negative log likelihood a commonly abused term to refer to a classification objective?

Naturally, your observation is correct & this is not the only example of neural networks researchers abusing terminology (see: https://stats.stackexchange.com/questions/378274/how-to-construct-a-cross-entropy-loss-for-general-regression-targets and https://stats.stackexchange.com/questions/544711/is-wikipedias-page-on-the-sigmoid-function-incorrect among others). Unless the authors published their reasoning, the only people who would know for sure why they made this choice are the authors themselves. Perhaps the PyTorch forums would have an answer. — Sycorax, Feb 28 '23 at 16:37
@Sycorax thanks for the response. I agree and I would extend to even statistics and machine learning as well. A handful of years of my learning of statistics and machine learning has led me to realize that gaining experience with notational differences and commonly abused terminology is a (larger than it should be) chunk of the work :). — paul, Feb 28 '23 at 17:42
Notice that PyTorch's NLLLOSS function actually just sums entries from a given input matrix according to a given list of indices. It's called 'NLLLOSS' because usually the inputs represents negative log likelihoods. So yes, the naming here is not that great — J. Delaney, Feb 28 '23 at 18:01

Dave · Accepted Answer · 2023-04-01T14:48:58.150

In some circles (perhaps just PyTorch users), "negative log likelihood" seems to be slang for "negative log-likelihood of the binomial/multinomial", yes.

This is abuse of terminology since, as you correctly point out, likelihood is a general concept in statistics. To defend PyTorch slightly, however, when you are predicting a category (or the probabilities of categories), the likelihood kind of has to be binomial/multinomial, indicating a rare situation where the modeler knows the likelihood. This is in contrast to a situation where the modeler uses square loss because of an assumed, but not known, Gaussian likelihood. It might be that the likelihood is not really Gaussian, but for categorical outcomes, the distribution is so simple that binomial/multinomial is kind of the only way the distribution can be. The probabilities of each category completely determine the distribution.

(I'm not actually sold on this because of the possibility of something like a beta-binomial distribution, but it is at least almost true, explaining the slang.)

Thanks for this really insightful explanation. – paul Apr 01 '23 at 14:40 — paul, Apr 01 '23 at 14:40

Negative Log Likelihood (NLL) reserved for a classification in PyTorch is weird?

1 Answers1

Linked