1

Cross entropy for a random variable $x \sim p$ and a distribution $q$ is defined as:

$$H(p,q) = -\sum_{x\in\mathcal{X}} p(x)\log q(x) = \mathbb{E}(\log q(x))$$

$\mathcal{X}$ is all possible values that the random variable $x$ can take. However, in the neural network world, the cross-entropy loss is defined as

$$loss = -\sum_{i=1}^N\sum_{j=1}^cp^i_c\log(q_c(x_i))\\p^i,q\in\mathbb{R}^c,x^i\in \mathbb{R}^{d_1 \times d_2}$$ $p^i$ is the hot one vector for instance $i$ and $q$ is the output vector of the softmax layer of the input $x_i$. $N$ is the batch size. I do not understand how to relate the formal definition and the cross-entropy loss. Although they are mathematically similar, conceptually they are measuring two different things. I think of the second one as that we have $N$ distributions and for each instance, I have the true distribution $p^i$ and the neural network output distribution $q^i$. So, the second term would be $\sum^N_iH(p^i,q(x_i))$. Is that correct?

The answer I am looking for is to write $loss$ as an expectation as it is shown in the first equation.

rando
  • 236

0 Answers0