1

I know that log-loss penalises models that are confident with the wrong predicted classes. Can this be translated to percentage accuracy? If not, then how do I report the error or compare it to other percentage error metrics?

For example, on training a neural network with 128 output layers with sigmoid activation, I get a loss reduction from 0.30 to 0.04 over 20 epochs. How do I evaluate classifier accuracy based on this? It is a multi-label classification problem.

goelakash
  • 111
  • You don't, proper scoring rules don't relate to accuracy at all. There's a hint if you reach the numerical bottom or top of the scale though, but that's it. – Firebug May 29 '16 at 21:28
  • I read somewhere that it can be interpreted as e^(-total_log_loss). This gives an accuracy estimate (compared to a random guessing model with e^(-log(no_of_labels)). Is this correct or a useful interpretation? – goelakash May 29 '16 at 22:06
  • 1
    $e^{-\log{K}}=K^{-1}$ is the uniform random guess probability, but that only translates as an accuracy estimate if you assume a naive threshold for classification. On the other hand we have logloss $L = -\sum{\log{P_{i}}}$ and so $e^{-L}=e^{\sum{\log{P_{i}}}}=\prod{P_{i}}$, it takes a single $P_j=0$ to make it reach zero, even if you have all other $P_{i, i \neq j}=1$, and so that cannot be an accuracy estimate. – Firebug May 30 '16 at 14:39
  • Hmm, that makes sense. But then how is cross-entropy useful at all, if not as a proper metric? If it can't be reported, or compared, how does one define the viability of the model? (Lets say in terms of accuracy/ error rate of classification) – goelakash May 30 '16 at 16:14
  • It's useful because it penalizes too confident mistakes. It's actually one of the few proper metrics. It also can be reported, if you bound your probability outputs to say $[10^{-15}, 1-10^{-15}]$. Now this part is my opinion: accuracy/error rates are an unnecessary part of the data analysis, dichotomizing your scores/probabilities should be left to a decision-maker. In general, you want the best probabilities estimates possible, and logloss indicates exactly that. – Firebug Jun 13 '16 at 23:18

2 Answers2

1

It seems like something related to McFadden's pseudo $R^2$ might do what you want.

In the usual $R^2$ in linear regression, there is a comparison of the square loss incurred by the model of interest $(L_1)$ and the square loss incurred by a model that predicts the overall mean $\bar y$ every time $(L_0)$.

$$ R^2 = \dfrac{L_0 - L_1}{L_0} $$

I think of it this way: I start out with a loss of $L_0$ and wind up with a loss of $L_1$, so how much of the original loss $L_0$ has been accounted for? Writing the calculation as in $R^2$ puts this in terms of a proportion of the original loss $L_0$.

McFadden's pseudo $R^2$ takes the same stance but uses crossentropy loss instead of square loss. In you case, you know the crossentropy loss of your model of interest, $L_1 = 0.04$ after those $20$ training epochs. You can calculate the $L_0$ loss by training on an intercept-only model. Then you do that same calculation.

$$ R^2_{\text{McFadden}} = \dfrac{L_0 - L_1}{L_0} $$

(If you want to do out-of-sample asessments, my argument here to base $L_0$ on the training data holds for this situation, too.)

Overall, $ R^2_{\text{McFadden}} = \dfrac{L_0 - L_1}{L_0} $ could be seen as measuring the percent decrease in crossentropy loss compared to this kind of baseline "must-beat" model.

None of this is related to the usual classification accuracy, but accuracy turns out to be a surprisingly problematic measure of performance, anyway.

Dave
  • 62,186
-2

Convert Probability to LogLoss then back to Probablity

var probability = 0.5;
var logloss = Math.Log(probability);
Console.WriteLine(logloss); //-0.693147180559945
var originaProbability = Math.Exp(logloss);
Console.WriteLine(originaProbability); //0.5!

I think this is all it is

  • 2
    Could you explain how this might result in the "percentage accuracy" requested in the question? – whuber May 08 '19 at 22:05