2

I ran regressions and random forests using log loss as scoring metric, as suggested here and here. I was reading this which was linked in the second reference, and I started doubting:

Is log loss/cross entropy the same, in practice, as the logarithmic scoring rule?

According to their concept, they should be similar:

"The logarithmic rule gives more credit to extreme predictions that are “right”" (about logarithmic score).

"Log loss penalizes both types of errors, but especially those predictions that are confident and wrong" (here, about cross entropy)

amestrian
  • 255

1 Answers1

1

Yes, these refer to the same equation, with the possible exception being multiplication by a positive number. For a sample size of $N$, predictions $\hat p_i\in[0,1]$, and true values $y_i\in\{0,1\}$, the log loss is:

$$ -\dfrac{1}{N}\overset{N}{\underset{i=1}{\sum}}\left[ y_i\log(\hat p_i) + (1-y_i)\log(1-\hat p_i) \right] $$

(It is possible that some will not multiply by the $\frac{1}{N}$. This doesn’t matter for optimization, since multiplying by a positive number does not change the values giving the optimum, but watch out for it when it comes to reporting the model performance. Whatever software you use should document what it does, though I would assume $\frac{1}{N}$ if the documentation does not mention anything.)

It is a convention that $0\times\log(0)$ is taken to be $0$, should the model make a probability prediction of $0$ or $1$. However, $1\times\log(0)$ is taken as $\infty$, hence the extremely harsh penalty for confident but incorrect predictions.

Dave
  • 62,186