13

This question discusses two equivalent ways to express the canonical loss function for a logistic regression, depending on if you code the categories as $\{0,1\}$ or $\{-1,+1\}$. In the following, let $x_i$ be the $i$th feature vector, $w$ be the parameter vector for the logistic regression, $N$ be the sample size, and $p(y_i)$ be the predicted probability of membership to category $1$.

$$ \text{Logistic Loss}\\ \dfrac{1}{N}\overset{N}{\underset{i=1}{\sum}} \log\left(1 + \exp(-y_i w^Tx_i)\right)\\ y_i\in\{-1,+1\} $$

$$ \text{Log Loss}\\ -\dfrac{1}{N}\overset{N}{\underset{i=1}{\sum}}\left[ y_i \log(p(y_i)) + (1 - y_i)\log(1 - p(y_i)) \right]\\ y_i\in\{0, 1\} $$

What is the algebra showing these two formulations to be equivalent? Not even the proposed duplicate to the first link really shows why the two must give the same loss value, and while both this and this are close, neither quite explicitly shows that $\text{Logistic Loss} = \text{Log Loss}$. I would like to see a chain of equal expressions like $\text{Logistic Loss} =\dots = \text{Log Loss}$.

Dave
  • 62,186

3 Answers3

16

Consider the case when $y_i = -1$ in the logistic loss and $y_i = 0$ in the log loss. The summand in the logistic loss becomes $$\log\left(1 + \exp(w^Tx_i)\right)$$ and the summand in the log loss becomes $$-\log(1 - p(y_i = 0))$$

Using the following equivalence given in your answer here $$ p(y_i) = \dfrac{1}{ 1 + \exp(-w^Tx_i) }\\ \Big\Updownarrow\\ w^Tx_i = \log\left( \dfrac{ p(y_i) }{ 1 - p(y_i) } \right) $$ We can re-write the summand in the logistic loss as \begin{align} \log\left(1 + \exp(w^Tx_i)\right) &= \log\left(1 + \exp\left(\log\left( \dfrac{ p(y_i=-1) }{ 1 - p(y_i=-1) } \right)\right)\right) \\ &= \log\left(1+ \frac{p(y_i=-1)}{1-p(y_i=-1)}\right) \\ &= \log\left(\frac{1-p(y_i=-1)}{1-p(y_i=-1)} + \frac{p(y_i=-1)}{1-p(y_i=-1)}\right) \\ &= \log\left(\frac{1}{1-p(y_i=-1)}\right) \\ &= -\log\left(1-p(y_i=-1)\right) \\ \end{align} Assuming that $p(y_i = -1)$ for the logistic loss is equivalent to $p(y_i = 0)$ for the log loss, the summand for the logistic loss (when $y_i = -1$) is equivalent to the summand for the log loss (when $y_i = 0$). The case when $y_i = 1$ in the logistic loss and $y_i = 1$ in the log loss can be shown in a similar way.

mhdadk
  • 4,940
7

Given $y_i\in\{-1,+1\}$, $z_i\in\{0,1\}$ and $z_i = (y_i+1)/2 \iff y_i = 2z_i-1$.

Also, $$p(y_i\equiv1) = p(z_i\equiv1)= \left(1+\exp(-w^Tx_i)\right)^{-1}\\ p(y_i\equiv-1) = p(z_i\equiv0)= \left(1 + \exp(w^Tx_i)\right)^{-1}$$

Then $$ \begin{align} \color{red}{\log\left(1 + \exp(-y_i w^Tx_i)\right)}&=\\ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i-1}{2}\right)\log\left(1 + \exp(w^Tx_i)\right)&=\\ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+(1-2)}{2}\right)\log\left(1 + \exp(w^Tx_i)\right)&=\\ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+1}{2}-1\right)\log\left(1 + \exp(w^Tx_i)\right)&=\\ z_i\log\left(1/p(z_i\equiv1)\right)- (z_i-1)\log\left(1/p(z_i\equiv0)\right)&=\\ -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(p(z_i\equiv 0\right)&=\\ -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(1-p(z_i\equiv1)\right)&=\\ \color{blue}{-\left(z_i\log\left(p(z_i\equiv1)\right)+ (1-z_i)\log\left(1-p(z_i\equiv1)\right)\right)}& \end{align} $$


A similar proof can be obtained by reverting to the Bernoulli likelihood.

Dave
  • 62,186
Firebug
  • 19,076
  • 6
  • 77
  • 139
  • 1
    My edit was incorrect? Then why does the minus sign disappear? – Dave Mar 17 '23 at 16:40
  • 1
    Thanks for the edit @Dave but I had a correct minus sign. The minus disappears because the second term is only activated when $y_i \equiv -1$ – Firebug Mar 17 '23 at 16:43
  • What do you mean by "A similar proof can be obtained by reverting to the Bernoulli likelihood."? Isn't the expression in blue the Bernoulli log-likelihood? – mhdadk Mar 17 '23 at 18:12
  • I meant incorrect @Dave, I'm sorry – Firebug Mar 17 '23 at 20:20
  • @mhdadk Basically exponentiating the loglikelihood, and identifying again the terms that cancel out depending on which value $y_1$ assumes – Firebug Mar 17 '23 at 20:43
  • I’m going to look at this in more detail to convince myself it is correct. (I know it is, but I want to understand why and maybe work through it without looking at your work.) – Dave Mar 19 '23 at 00:14
  • Please let me know which step is unclear @Dave – Firebug Mar 19 '23 at 00:18
  • The big one is the one I tried to edit. I get your explanation but want to work through it. – Dave Mar 19 '23 at 00:26
  • 1
    I added a line and slightly modified another in your derivation for clarity. Hope that's OK. – mhdadk Mar 19 '23 at 14:07
  • 1
    Sure @mhdadk, if that improves clarity for others :) – Firebug Mar 19 '23 at 14:08
  • 1
    If it is okay with you, I would like to award a bounty to this answer and then expand on this with a self-answer that I will accept. – Dave Apr 08 '23 at 23:22
  • 1
    Sure @Dave, it's your question :) – Firebug Apr 11 '23 at 07:09
1

Let's start by defining some notation.

$$ y_i\in\{-1,+1\}\\ z_i\in\{0,1\} $$

Then $z_i = (y_i+1)/2 \iff 2z_i = y_i + 1 \iff y_i = 2z_i-1$.

Also, $p(y_i\equiv1) = p(z_i\equiv1)$ and $(y_i\equiv-1) = p(z_i\equiv0)$.

Also: $$p(y_i\equiv1) = p(z_i\equiv1)= \left(1+\exp(-w^Tx_i)\right)^{-1}\\ p(y_i\equiv-1) = p(z_i\equiv0)= \left(1 + \exp(w^Tx_i)\right)^{-1}$$

When $y_i = -1$, then $\dfrac{y_i + 1}{2} = 0$ and $\dfrac{y_i - 1}{2} = -1$. When $y_1 = +1$, then $\dfrac{y_i + 1}{2} = 1$ and $\dfrac{y_i - 1}{2} = 0$. Consequently:

$$ \color{red}{\log\left(1 + \exp(-y_i w^Tx_i)\right)}\\ =\left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i-1}{2}\right)\log\left(1 + \exp(w^Tx_i)\right) $$

Then $1-2 = -1$, so we get:

$$ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i-1}{2}\right)\log\left(1 + \exp(w^Tx_i)\right)\\= \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+(1-2)}{2}\right)\log\left(1 + \exp(w^Tx_i)\right) $$

For the fraction on the right, $\dfrac{y_i+(1-2)}{2} = \dfrac{y_i + 1}{2} - 1$, so:

$$ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+(1-2)}{2}\right)\log\left(1 + \exp(w^Tx_i)\right)\\= \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+1}{2}-1\right)\log\left(1 + \exp(w^Tx_i)\right) $$

Since $z_i = (y_i+1)/2$, $p(y_i\equiv1) = \left(1+\exp(-w^Tx_i)\right)^{-1}$, and $p(y_i\equiv-1) = \left(1 + \exp(w^Tx_i)\right)^{-1}$:

$$ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+1}{2}-1\right)\log\left(1 + \exp(w^Tx_i)\right)\\= z_i\log\left(1/p(z_i\equiv1)\right)- (z_i-1)\log\left(1/p(z_i\equiv0)\right) $$

Next, a logarithm rule is that $\log(1/x) = -\log(x)$ for $x>0$.

$$ z_i\log\left(1/p(z_i\equiv1)\right)- (z_i-1)\log\left(1/p(z_i\equiv0)\right)\\= -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(p(z_i\equiv 0\right) $$

Next, $p(z_i \equiv 0) = 1 - p(z_i \equiv 1)$, so:

$$ -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(p(z_i\equiv 0\right)\\= -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(1-p(z_i\equiv1)\right) $$

Next, factor out the minus sign.

$$ -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(1-p(z_i\equiv1)\right)\\= {-\left(z_i\log\left(p(z_i\equiv1)\right)- (z_i-1)\log\left(1-p(z_i\equiv1)\right)\right)} $$

Finally, distribute the minus sign across the $z_i - 1$ on the right.

$$ {-\left(z_i\log\left(p(z_i\equiv1)\right)- (z_i-1)\log\left(1-p(z_i\equiv1)\right)\right)}\\= \color{blue}{-\left(z_i\log\left(p(z_i\equiv1)\right)+ (1-z_i)\log\left(1-p(z_i\equiv1)\right)\right)} $$

With each summand equal in the logistic and log loss functions defined in the question, the two loss functions are equal.

Dave
  • 62,186
  • The point of this answer is to fill in the details of the steps taken in Firebug's answer (which will be getting the bounty once the grace period starts), so if there are details missing to justify each step, please let me know! – Dave Apr 11 '23 at 12:19