1

Given two distributions, one a parameterized gaussian and the other a standard normal gaussian:

$q(x) \sim \mathcal{N}(\mu,\sigma)$

$p(x) \sim \mathcal{N}(0,I)$

We want to compute the KL Divergence $D_{KL}(q(x)||p(x))$. It is widely known that we can compute this in closed form solution such that the total KL divergence results in:

$= \sum_{i=0}^{D}(1 + \log(\sigma_i²) - \mu_i² - \sigma_i²)$

For a random vector with dimension $D$.

However, I tried to derive this from a different perspective and don't understand what I'm getting wrong... would really appreciate if someone could help me out here!

For a random variable $x \sim \mathcal{N}(\mu,\sigma)$, we can reparameterize it by drawing from a noise variable $\epsilon \sim \mathcal{N}(0,1)$ and setting $x = \mu + \sigma\epsilon$.

Next, the KL Divergence is given as:

$D_{KL}(q(x)||p(x)) = \int q(x)\: log\frac{q(x)}{p(x)} = \mathbb{E}_{q(x)}[q(x) - p(x)]$

The log of a Gaussian with $\mu,\sigma$ as:

$\log \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}{\left(\frac{x-\mu}{\sigma}\right)}^{2}} = -\log{\sigma} - \frac{1}{2}\log(2\pi) - \frac{1}{2} {\left(\frac{x-\mu}{\sigma}\right)}^{2}$

And the log of a standard normal Gaussian with random variable $\epsilon$ as:

$\log \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}{\epsilon}^{2}} = - \frac{1}{2}\log(2\pi) -\frac{1}{2}{\epsilon}^{2}$

So why can't we simply do:

$$ \begin{eqnarray} \mathbb{E}_{q(x)}[q(x) - p(x)] &=& \mathbb{E}_{q(x)}[\log{\sigma} - \frac{1}{2}\log(2\pi) - \frac{1}{2} {\left(\frac{x-\mu}{\sigma}\right)}^{2} - (- \frac{1}{2}\log(2\pi) -\frac{1}{2}{\epsilon}^{2})] \\ &=& \mathbb{E}_{q(x)}[\log(\sigma) - \frac{1}{2} {\left(\frac{x-\mu}{\sigma}\right)}^{2} -\frac{1}{2}{\epsilon}^{2}] \\ &=& \mathbb{E}_{p(\epsilon)}[\log(\sigma) - \frac{1}{2} {\left(\frac{\mu - \sigma\epsilon -\mu}{\sigma}\right)}^{2} +\frac{1}{2}{\epsilon}^{2}] \\ &=& \mathbb{E}_{p(\epsilon)}[\log(\sigma) - \frac{1}{2}{\epsilon}^{2} +\frac{1}{2}{\epsilon}^{2}] \\ &=& \mathbb{E}_{p(\epsilon)}[\log(\sigma)] \\ &=& \log(\sigma) \end{eqnarray} $$

Something is really missing here :( I thought that by we are allowed to plug-in the reparameterization of $x = \mu + \sigma\epsilon$ and thus change the distribution of expected value from $q(x)$ to $p(\epsilon)$.

1 Answers1

0

Let's focus on the one-dimensional case. As you have shown, by definition, the KL-divergence $D_{\rm KL}(q(x) \vert p(x))$ is given by $$ \begin{aligned} D_{\rm KL}(q(x) \vert p(x)) &= \int dx \ q(x) \log\left(\frac{q(x)}{p(x)}\right)\\ &= {\rm E}_{q(x)}\left[\log q(x) - \log p(x)\right]. \end{aligned} $$ Following your steps, the KL-divergence $D_{\rm KL}(q(x) \vert p(x))$ for the two gaussian would be $$ \begin{aligned} D_{\rm KL}(q(x) \vert p(x)) &= {\rm E}_{q(x)}\left[\log q(x) - \log p(x)\right]\\ &={\rm E}_{q(x)}\left[-\log\sigma - \frac{1}{2}\log2\pi - \frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2 - \left(- \frac{1}{2}\log2\pi - \frac{1}{2}x^2\right) \right]\\ &={\rm E}_{q(x)}\left[-\log\sigma - \frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2 + \frac{1}{2}x^2\right], \end{aligned} $$ and you would like to do the calculation not in $x$ but in $\epsilon$, where $$ \epsilon = \left(\frac{x - \mu}{\sigma}\right). $$ The result would be $$ \begin{aligned} D_{\rm KL}(q(x) \vert p(x)) &= {\rm E}_{q(x)}\left[-\log\sigma - \frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2 + \frac{1}{2}x^2\right]\\ &={\rm E}_{q(\epsilon)}\left[-\log\sigma - \frac{1}{2}\epsilon^2 + \frac{1}{2}(\mu + \sigma\epsilon)^2\right]\\ &=-\log\sigma - \frac{1}{2} + \frac{1}{2}\mu^2 + \frac{1}{2} \sigma^2, \end{aligned} $$ which is consistent with the result found here and here.

Peter Pang
  • 648
  • 3
  • 10