4

in one of the commments to this post concerning the application of Kullback-Leibler-divergence between measures that do not fulfill the necessary absolute continuity (e.g. point mass vs. continuous) , as an extension it is proposed to define the KL-Divergence with respect to the sum of measures.

That is, given measures $P,Q$ define $D_{KL}(P||Q) := \int \frac{dP}{d(P+Q)} \ln \frac{\frac{dP}{d(P+Q)}}{\frac{dQ}{d(P+Q)}} d(P+Q) $, (possibly taking value $\infty$)

What I would be interested now: is this continuous in measures in some sense?

As an example take specific situation, where we have some fixed $Q$ and $P_n$ (both absolutely continuous w.r.t. Lebesgue measure) and then some limit $P$ that is eg. a point mass - do we get

$ \underset{n \rightarrow \infty} {\lim} D_{KL} (P_n||Q) = D_{KL} (P||Q) = \infty $

take for example $P_n := \mathcal{N}(0,\frac{1}{n}) \overset{d}{\rightarrow} \delta_0$ and $Q$ arbitrary.

a_student
  • 291

1 Answers1

2

First, note that as long as the base measure $\mu$ dominates both $P$ and $Q$, it doesn't matter what $\mu$ you choose (for any $f$-divergence). So if both measures are absolutely continuous wrt Lebesgue, then this distance is just exactly the standard KL.

A special case of your specific example is easy to see: taking $P_n = \mathcal N(0, 1/n)$ and $Q = \mathcal N(0, 1)$, plugging in the KL formula for normals gives that $D_{KL}(P_n \| Q) = \log n + \frac{1}{2 n^2} - \frac12 \to \infty$.

KL, however, is not continuous in the most common sense, that of the weak topology (convergence in distribution); for an example, take $P_n = \delta_{1/n} \xrightarrow{d} Q = \delta_0$, so that $D_{KL}(P_\infty \| Q) = 0$, but $D_{KL}(\delta_{1/n}, \delta_0) = \infty$ for any $n$. (For any two distributions with disjoint support, $\frac{\mathrm d P}{\mathrm d (P+Q)}(x) = \begin{cases} 1 & x \in \operatorname{supp}(P) \\ 0 & \text{o.w.}\end{cases}$, so that $D_{KL}(P \| Q) = \int_{\operatorname{supp}(P)} 1 \log\frac{1}{0} \mathrm d x = \infty$.) Its topology is stronger than the total variation topology (by Pinsker's inequality) which is stronger than the weak topology, but I don't know much more about it; there are a handful of notes here.

Danica
  • 24,685