Does every commonly used loss function have an interpretation as maximum likelihood estimation for some likelihood?

Question

We know that quadratic loss can be deduced using maximum likelihood of Gaussian distribution; cross-entropy loss can be deduced using maximum likelihood of Bernoulli distribution.

Now my question is: do some other frequently used loss functions also have such explanation? For examples, what is the probabilistic models underlying hinge loss, exponential loss, L1 loss (Mean absolute error), etc? Can those be interpreted as maximum likelihood estimation for some likelihood?

A proof that every loss function corresponds to some kind of maximum likelihood estimation would be appreciated, as would a counterexample that gives a loss function and proves that it cannot correspond with maximum likelihood estimation for any likelihood. If that counterexample uses a fairly common loss function (e.g., regularization), that is even better.

You need to make the question more formal and less vague, otherwise the answer is going to be an unhelpful yes. — Xi'an, Jan 04 '19 at 20:55
I agree that the question is too vague currently to receive a helpful answer. I will comment that minimizing $L_1$ loss is equivalent to maximizing the likelihood under the Laplacian (Double exponential) distribution. — knrumsey, Jan 04 '19 at 22:14
Can you give (or point to) a list of these "commonly used loss functions"? — kjetil b halvorsen, May 04 '19 at 14:28
Can't offer a full answer, but for an example with regularization consider the fact that LASSO is essentially Bayesian regression with a Laplace prior. — Durden, May 23 '23 at 19:24
https://hastie.su.domains/Papers/ESLII.pdf page see section 10.5 page 346 page, points out exponential loss is not a log likelihood. — seanv507, May 23 '23 at 21:40

Wilbur · Answer 1 · 2023-05-28T02:35:14.897

A loss function $\mathit{L}:X \to\mathbb{R}^+$ can be motivated as an MLE provided that $\int_X \exp (-b\mathit{L}(x) )dx $ converges for some $b > 0$. That will often be the case, but one can construct counterexamples.

Proof: Consider the problem of finding some function $f$ which minimizes the loss $\sum_i \mathit{L}(y_i - f(x_i))$. This will be equivalent to maximizing the conditional log-likelihood $\sum_i \mathscr{L}(y_i - f(x_i))$ provided that $\mathscr{L}$ is an affine transformation of $\mathit{L}$ and that $\mathscr{L}$ is the log of a pdf. In other words, we require that there exists $a \in \mathbb{R},b\in \mathbb{R}^+$ such that $\exp(a - bL(x))$ is a pdf. A function is a pdf if it is non-negative and if it integrates to 1. Non-negativity is guaranteed by the $\exp$ function, so we just require that for some $a,b$: $$ \int_X \exp(a - bL(x)) dx = 1. $$ The constant $a$ can be factored out, so the requirement just becomes that $$ \int_X \exp(- b L(x) ) dx < \infty. \tag{1} $$

I believe that requirement (1) will be failed by the pathological loss function $L(x) = \log(\log(1+|x|))$ and domain $X = \mathbb{R}$: $$ \int_X \exp(- b L(x) ) dx = \int_{-\infty}^{\infty} \exp(- b \log(\log(1+|x|)) ) dx = \int_{-\infty}^{\infty} \frac{1}{(\log(1+|x|))^{b } }dx , $$ which I think diverges for any positive $b$. I'm not sufficiently familiar with different loss functions to know whether there's a 'common' loss function that fails requirement (1), but it shouldn't be too hard to check each one.

"A loss function $L: X\rightarrow\mathbb R^+$ can be motivated as an MLE provided that $\int_X \exp(-bL(x))dx$ converges for some $b>0$." Why? Do you have a proof or a reference? — Dave, May 27 '23 at 18:29
The second part of the answer is a proof of that statement -- will edit to clarify — Wilbur, May 28 '23 at 02:34
I need to go through this in more detail than I can right now in order to accept this answer, but you took the time to attempt to answer and edited in response to my follow-up. +50 — Dave, May 30 '23 at 19:07

Does every commonly used loss function have an interpretation as maximum likelihood estimation for some likelihood?

1 Answers1