I am trying to understand the example described in the WGAN paper about learning parallel lines with various divergences. More specifically the setup is as follows:
Let $Z \sim [0, 1]$ the uniform distribution on the unit interval. Let $\mathbb{P}_0$ be the distribution of $(0, Z) \in \mathbb{R}^2$, uniform on a straight vertical line passing through the origin (meaning $g_{0}(z) = (0, z)$). Now let $g_{\theta}(z) = (\theta, z)$ with $\theta$ a single real parameter. The authors claim that:
- $KL(\mathbb{P}_0||\mathbb{P}_{\theta}) = KL(\mathbb{P}_{\theta}||\mathbb{P}_0)$ = $\infty$ if $\theta \neq 0$ and $0$ otherwise
- $JS(\mathbb{P}_0||\mathbb{P}_{\theta})$ = $\log2$ if $\theta \neq 0$ and $0$ otherwise
Intuitively, the results make sense but trying to reproduce the calculations I am a bit confused about how are we going to use the KL/JS divergence in this setting. For KL/JS divergences we have:
$KL(\mathbb{P}_0||\mathbb{P}_{\theta})=\int_{0}^{1} (g_0(z) \log\frac{g_0(z)}{g_{\theta}(z)}) \mathrm{d}z $
$KL(\mathbb{P}_{\theta}||\mathbb{P}_{0})=\int_{0}^{1} (g_{\theta}(z) \log\frac{g_{\theta}(z)}{g_{0}(z)}) \mathrm{d}z$
$JS(\mathbb{P}_0||\mathbb{P}_{\theta}) = \frac{1}{2}\int_{0}^{1} (g_{\theta}(z) \log\frac{g_{\theta}(z)}{\frac{g_{0}(z)+g_{\theta}(z)}{2}})\mathrm{d}z + \frac{1}{2}\int_{0}^{1} (g_{0}(z) \log\frac{g_{0}(z)}{\frac{g_{0}(z)+g_{\theta}(z)}{2}})\mathrm{d}z $
Can anyone help me on how to incorporate the single point information ($0$ or $\theta$) in these integrals and do a bit more analytically the calculations? Do we have to take a separate integral over a single point because of the definition of $g_{0}(z), g_{\theta}(z)$?