Scikit Binomial Deviance Loss Function

Question

This is scikit GradientBoosting's binomial deviance loss function,

   def __call__(self, y, pred, sample_weight=None):
        """Compute the deviance (= 2 * negative log-likelihood). """
        # logaddexp(0, v) == log(1.0 + exp(v))
        pred = pred.ravel()
        if sample_weight is None:
            return -2.0 * np.mean((y * pred) - np.logaddexp(0.0, pred))
        else:
            return (-2.0 / sample_weight.sum() *
                    np.sum(sample_weight * ((y * pred) - np.logaddexp(0.0, pred))))

This loss functions is not similar between class with 0 and class with 1. Can anyone explain how this is considered OK.

For example, with no sample weigth, the loss function for class 1 is

-2(pred - log(1 + exp(pred))

vs for class 0

-2(-log(1+exp(pred))

The plot for these two are not similar in terms of cost. Can anyone help me understand.

score 20 · Accepted Answer · edited Feb 22 '17 at 21:01

20

There are two observations needed to understand this implementation.

The first is that pred is not a probability, it is a log odds.

The second is a standard algebraic manipulation of the binomial deviance that goes like this. Let $P$ be the log odds, what sklearn calls pred. Then the definition of the binomial deviance of an observation is (up to a factor of $-2$)

$$y \log(p) + (1-y) \log(1 - p) = \log(1 - p) + y \log \left( \frac{p}{1-p} \right)$$

Now observe that $p = \frac{e^{P}}{1 + e^{P}}$ and $1-p = \frac{1}{1 + e^{P}}$ (a quick check is to sum them in your head, you'll get $1$). So

$$\log(1-p) = \log \left( \frac{1}{1 + e^{P}} \right) = - \log(1 + e^{P}) $$

and

$$ \log \left( \frac{p}{1-p} \right) = \log ( e^{P} ) = P $$

So altogether, the binomial deviance equals

$$y P - \log( 1 + e^{P} )$$

Which is the equation sklearn is using.

edited Feb 22 '17 at 21:01

davalo

3

answered Jun 20 '15 at 20:43

Matthew Drury

35,629

Thanks you. If i replace pred with log odds, the loss function is uniform for both the classes. – Kumaran Jun 21 '15 at 06:11
This same question came up for me recently. I was looking at https://gradientboostedmodels.googlecode.com/git/gbm/inst/doc/gbm.pdf page 10 where the gradient of the deviance is listed. But it seems like the gradient they show is for the log-lik not the negative log-lik. Is this correct - it seems to match your explanation here? – B_Miner Mar 29 '16 at 17:48
1

@B_Miner the link is broken – Fenil Jun 30 '18 at 11:43
Are you sure pred is not the predicted score (instead of the log odds of the predicted score)? Otherwise, sounds confusing that scikit-learn did not name that variable log_odds... – Tanguy Jul 22 '21 at 21:41

Scikit Binomial Deviance Loss Function

1 Answers1

Linked