What's the maximum value of Kullback-Leibler (KL) divergence

Question

I am going to use KL divergence in my python code and I got this tutorial.

On that tutorial, to implement KL divergence is quite simple.

kl = (model * np.log(model/actual)).sum()

As I understand, the probability distribution of model and actual should be <= 1.

My question is, what's the maximum bound/maximum possible value of k?. I need to know the maximum possible value of kl distance as for the maximum bound in my code.

This is duplicate of https://stats.stackexchange.com/q/333877/103153 — Lerner Zhang, Dec 10 '18 at 12:43

Xi'an · Answer 1 · 2018-06-19T06:30:45.360

25

Or even with the same support, when one distribution has a much fatter tail than the other. Take $$KL(P\vert\vert Q) = \int p(x)\log\left(\frac{p(x)}{q(x)}\right) \,\text{d}x$$ when $$p(x)=\overbrace{\frac{1}{\pi}\,\frac{1}{1+x^2}}^\text{Cauchy density}\qquad q(x)=\overbrace{\frac{1}{\sqrt{2\pi}}\,\exp\{-x^2/2\}}^\text{Normal density}$$ then $$KL(P\vert\vert Q) = \int \frac{1}{\pi}\,\frac{1}{1+x^2} \log p(x) \,\text{d}x + \int \frac{1}{\pi}\,\frac{1}{1+x^2} [\log(2\pi)/2+x^2/2]\,\text{d}x$$ and $$\int \frac{1}{\pi}\,\frac{1}{1+x^2} x^2/2\,\text{d}x=+\infty$$ There exist other distances that remain bounded such as

the $L¹$ distance, equivalent to the total variation distance,
the Wasserstein distances
the Hellinger distance

edited Jun 19 '18 at 06:30

answered Jun 18 '18 at 20:13

Xi'an

105,342

1

Very good remark @Xi'an – Carlos Campos Jun 18 '18 at 21:07
Thanks @Xi'an is that mean, even the sum of all bins for both distributions are = 1, kl divergence has no a maximum bound?
do you have any other options distance function for two probability distribution that has defined maximum bound/ static bound?
– user46543 Jun 19 '18 at 01:10
Is P absolutely continuous with respect to Q in this case? – Sangwoong Yoon Apr 05 '19 at 12:58
1

As the question is about bins, questioner may be interested in discrete case. In this case maximum is when p puts all probability mass into bin where q is minimum -- so max kl 1 - \log q_m -- where q_m is smallest bin probability for distribution q. This also goes to infinity as q_m goes to 0, but in concrete situations, you may have a value for q_m allowing calculation of bound. – shaunc Apr 13 '21 at 13:59

Carlos Campos · Answer 2 · 2018-06-27T12:16:28.323

17

For distributions which do not have the same support, KL divergence is not bounded. Look at the definition:

$$KL(P\vert\vert Q) = \int_{-\infty}^{\infty} p(x)\ln\left(\frac{p(x)}{q(x)}\right) dx$$

if P and Q have not the same support, there exists some point $x'$ where $p(x') \neq 0$ and $q(x') = 0$, making KL go to infinity. This is also applicable for discrete distributions, which is your case.

Edit: Maybe a better choice to measure divergence between probability distributions would be the so called Wasserstein distance which is a metric and has better properties than KL divergence. It has become quite popular due to its applications in deep-learning (see WGAN networks)

edited Jun 27 '18 at 12:16

answered Jun 18 '18 at 14:38

Carlos Campos

694

Thanks @carlos-campos my distribution both the actual and model have the same condition which is the sum of all bins = 1. Is that means my Kl divergence still does not has a maximum bound?
I will look at wassertein distance
– user46543 Jun 19 '18 at 00:58
does Wasserstein or Earth mover distance has an explicit maximum bound? because I need it. – user46543 Jun 19 '18 at 01:47
@user46543 Wasserstein distance can be as high as $\infty$ – Mark L. Stone Jun 19 '18 at 11:31
Hi @MarkL.Stone so there is no distance function for calculating the distance between two probability distributions which has the static maximum bound? e.g while two probability distributions have sum of 1 and the maximum bound of the distance will be 1. Am I correct? – user46543 Jun 19 '18 at 11:54

score 7 · Answer 3 · answered Jun 19 '18 at 01:29

To add to the excellent answers by Carlos and Xi'an, it is also interesting to note that a sufficient condition for the KL divergence to be finite is for both random variables to have the same compact support, and for the reference density to be bounded. This result also establishes an implicit bound for the maximum of the KL divergence (see theorem and proof below).

Theorem: If the densities $p$ and $q$ have the same compact support $\mathscr{X}$ and the density $p$ is bounded on that support (i.e., is has a finite upper bound) then $KL(P||Q) < \infty$.

Proof: Since $q$ has compact support $\mathscr{X}$ this means that there is some positive infimum value:

$$\underline{q} \equiv \inf_{x \in \mathscr{X}} q(x) > 0.$$

Similarly, since $p$ has compact support $\mathscr{X}$ this means that there is some positive supremum value:

$$\bar{p} \equiv \sup_{x \in \mathscr{X}} p(x) > 0.$$

Moreover, since these are both densities on the same support, and the latter is bounded, we have $0 < \underline{q} \leqslant \bar{p} < \infty$. This means that:

$$\sup_{x \in \mathscr{X}} \ln \Bigg( \frac{p(x)}{q(x)} \Bigg) \leqslant \ln ( \bar{p}) - \ln(\underline{q}).$$

Now, letting $\underline{L} \equiv \ln ( \bar{p}) - \ln(\underline{q})$ be the latter upper bound, we clearly have $0 \leqslant \underline{L} < \infty$ so that:

$$\begin{equation} \begin{aligned} KL(P||Q) &= \int \limits_{\mathscr{X}} \ln \Bigg( \frac{p(x)}{q(x)} \Bigg) p(x) dx \\[6pt] &\leqslant \sup_{x \in \mathscr{X}} \ln \Bigg( \frac{p(x)}{q(x)} \Bigg) \int \limits_{\mathscr{X}} p(x) dx \\[6pt] &\leqslant (\ln ( \bar{p}) - \ln(\underline{q})) \int \limits_{\mathscr{X}} p(x) dx \\[6pt] &= \underline{L} < \infty. \\[6pt] \end{aligned} \end{equation}$$

This establishes the required upper bound, which proves the theorem. $\blacksquare$

The result is correct but the constraint heavy: a Beta ${\cal B}(\alpha,\beta)$ density does not enjoy a compact support when $\max(\alpha,\beta)>1$. — Xi'an, Jun 19 '18 at 06:28
That's true: it is only a sufficient condition after all. Weaker sufficient conditions are welcome! — Ben, Jun 19 '18 at 06:32

score 0 · Answer 4 · answered Feb 13 '23 at 13:07

An answer is here https://arxiv.org/abs/2008.05932

You must define an L-shaped distribution by transforming the probability distribution into a multiplicity distribution, calculating the quantum of the distribution, and going back to a probability distribution. The maximum of the KL for your distribution P is KL(P||L).

What's the maximum value of Kullback-Leibler (KL) divergence

4 Answers4

Linked

Related