can we use binary cross entropy with labels -1 and 1?

Question

Binary cross entropy is written as follows:

\begin{equation} \mathcal{L} = -y\log\left(\hat{y}\right)-(1-y)\log\left(1-\hat{y}\right) \end{equation}

In every reference that I read, when using binary cross entropy, they use labels 0 and 1, with activation the output layer is sigmoid. I wonder if it is possible to use cross entropy labeled -1 and 1 with the output layer using tanh activation?

No, you can’t just tanh (logarithms only give real outputs for positive inputs) but you can re-express the loss to give equivalent results for —1/+1 labels. https://stats.stackexchange.com/q/229645/22311 — Sycorax, May 26 '23 at 10:39
See: What is the algebra showing the logistic and log loss to be equivalent? — Firebug, May 26 '23 at 10:42

score 1 · Accepted Answer · answered May 26 '23 at 10:30

1

No, you can’t. What would $\log\left(\hat{y}\right)$ be when $\hat{y}$ is (close to) -1?

There are simple workarounds. You can rescale your outputs to $[0, 1]$, or you can use Brier score instead of cross entropy, but why would you?

Using $\tanh$ activation functions in hidden layers is a natural thing to do, but in the output layer one advantage of sigmoid is that it has a natural probabilistic interpretation. As a consequence, the outputs are compatible with the cross entropy function you defined above.

answered May 26 '23 at 10:30

Arya McCarthy

8,787

But one could re-express OP’s loss function in a way that gives equivalent results for -1/+1 labels. – Sycorax May 26 '23 at 10:34
@Sycorax Yep! – Dave May 26 '23 at 10:52
Absolutely, but then it’s not (conceptually) cross-entropy anymore. I did mention rescaling the outputs. You could rescale the loss function instead. – Arya McCarthy May 26 '23 at 10:52

can we use binary cross entropy with labels -1 and 1?

1 Answers1