The basic KL-Divergence between two distributions is as:
$KL(N(\mu_1,\sigma_1) || N(\mu_2, \sigma_2)) = \log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2}$
I am reading VAE neural networks, that use KL-Divergence against $N(0, 1)$. However, in the articles the equation is in form:
$0.5 * \sum(1 + log(\sigma^2) - \sigma^2 - \mu^2)$
How is this second equation derived from the "original" one?
I came up with this:
$\mu_2 = 0$
$\sigma_2 = 1$
which gives
$log \frac{1}{\sigma_1} + \frac{\sigma_1^2 + (0 - \mu_2)^2}{2} - \frac{1}{2}$ =
$log (1) - log (\sigma_1) + \frac{\sigma_1^2 + \mu_2^2}{2} - \frac{1}{2}$ =
$-2log (\sigma_1) + \sigma_1^2 + \mu_2^2 - 1$ =
$-log (\sigma_1^2) + \sigma_1^2 + \mu_2^2 - 1$ =
$1 + log (\sigma_1^2) - \sigma_1^2 - \mu_2^2$
but 0.5 before $\sum$ is mystery to me.