I am quite curious how we should write the likelihood function for -1, 1 in binomial case? The reason I am asking is because this, one of famous papers on gradient boosting. Just on the first page, he says the loss function for negative log-likelihood can be written as $\log(1 + \exp(-2yF))$. As far as I remember if you follow the derivation of loss of the logistic regression with label 1 and -1, you end up with $\log(1 + \exp(-yF))$. How can I get a coefficient 2? I google a lot and never see this kind of form. Does anyone have any ideas?
1 Answers
Let $Y \in\{-1,1\}$ be binomial, and let $F=F(x)$ be the linear predictor in a logistic regression. The idea of writing $Y$ as $\pm 1$ is to treat the two cases symmetrically, so we want the linear predictor to have a symmetric interpretation as well, that is, replacing $F$ by $-F$ is the predictor for the other outcome. Then we must have $$ \DeclareMathOperator{\P}{\mathbb{P}} \P(Y=1 \mid x)=\frac{e^F}{e^{-F}+e^F}= \frac{e^{2F}}{1+e^{2F}} $$Now the equal-probability model will be given by $F(x)=0$.
Let $Y'=(Y+1)/2 \in \{0,1\}$. Then we can write the binomial likelihood as $$ \P(Y=1 \mid x)^{Y'} \cdot \P(Y=-1 \mid x)^{1-Y'} $$ The loglikelihood becomes $$ -Y' \log\left( 1+e^{-2F(x)}\right) -(1-Y')\log\left( 1+e^{2F(x)}\right) $$ But now using that $Y$ can only be plus/minus 1, case by case, we see that there is the common expression $-\log\left( 1+e^{-2YF(x)} \right)$. Multiplying be $-2$ we get the residual deviance $2 \log\left( 1+e^{-2YF(x)} \right)$. Apart from the extra factor 2, irrelevant for optimization, that is your answer.
- 77,844
-
1Sorry I miss your answer for quite a long time. very well explained. Thx! – Li haonan Mar 13 '20 at 01:43