As is discussed in another question I posted, it is legitimate to compare the AIC values for two linear models using the same features but differing in their likelihoods, such as Gaussian and Laplace. This does not make sense to me.
Let $k$ be the parameter count and $\hat L$ be the model likelihood. Then:
$$ AIC=2k-2\log\left(\hat L\right) $$
However, the $\hat L$ for a Gaussian likelihood is a measure of square loss, while the $\hat L$ for a Laplace likelihood is a measure of absolute loss.
It then seems that a comparison of AICs is equivalent to a comparison of square and absolute loss, and such a comparison does not make sense to me. Why does it become a reasonable comparison when we apply the AIC transformation to the square or absolute loss?
generalized-linear-modeltag relevant here? Also, consider adding theloss-functionstag. – Richard Hardy Oct 27 '22 at 13:44