1

Binary cross entropy is written as follows:

\begin{equation} \mathcal{L} = -y\log\left(\hat{y}\right)-(1-y)\log\left(1-\hat{y}\right) \end{equation}

In every reference that I read, when using binary cross entropy, they use labels 0 and 1, with activation the output layer is sigmoid. I wonder if it is possible to use cross entropy labeled -1 and 1 with the output layer using tanh activation?

andryan86
  • 127

1 Answers1

1

No, you can’t. What would $\log\left(\hat{y}\right)$ be when $\hat{y}$ is (close to) -1?

There are simple workarounds. You can rescale your outputs to $[0, 1]$, or you can use Brier score instead of cross entropy, but why would you?

Using $\tanh$ activation functions in hidden layers is a natural thing to do, but in the output layer one advantage of sigmoid is that it has a natural probabilistic interpretation. As a consequence, the outputs are compatible with the cross entropy function you defined above.