Questions tagged [kullback-leibler]

An asymmetric measure of distance (or dissimilarity) between probability distributions. It might be interpreted as the expected value of the log likelihood ratio under the alternative hypothesis.

Kullback–Leibler divergence is an asymmetric measure of distance (or dissimilarity) between probability distributions. If $F(\cdot)$ and $G(\cdot)$ are the two distribution functions, with $G(\cdot)$ being absolutely continuous with respect to $F(\cdot)$ (i.e., has the support that is a subset of support of $F(\cdot)$), then KL divergence is

$$ D(F,G) = \int \ln\left( \frac{ {\rm d} F}{{\rm d}G}\right) {\rm d} F $$

For continuous distributions, interpret ${\rm d} F$ as the density, and for discrete distributions, as the point mass.

It is not a distance, in that $D(F,G) \neq D(G,F)$, yet it provides an important measure of how similar the two distributions are.

References

https://en.wikipedia.org/wiki/Kullback–Leibler_divergence

533 questions
31
votes
4 answers

An adaptation of the Kullback-Leibler distance?

Look at this picture: If we draw a sample from the red density then some values are expected to be less than 0.25 whereas it is impossible to generate such a sample from the blue distribution. As a consequence, the Kullback-Leibler distance from…
ocram
  • 21,851
19
votes
2 answers

Kullback-Leibler divergence - interpretation

I have a question about the Kullback-Leibler divergence. Can someone explain why the "distance" between the blue density and the "red" density is smaller than the distance between the "green" curve and the "red" one?
user3016
15
votes
1 answer

Kullback-Leibler divergence: negative values?

Wikipedia - KL properties says that KL can never be negative. But e.g. for texts where the probabilities are very small I somehow get negative values? E.g. Collection A: - word count: 321 doc count: 65888 probA: 0,004871904 Collection B: - word…
Andreas
  • 629
15
votes
4 answers

Estimate the Kullback–Leibler (KL) divergence with Monte Carlo

I want to estimate the KL divergence between two continuous distributions $f$ and $g$. However, I can't write down the density for either $f$ or $g$. I can sample from both $f$ and $g$ via some method (for example, Markov chain Monte Carlo). The KL…
frelk
  • 1,337
11
votes
2 answers

How to calculate Kullback-Leibler divergence/distance?

I have three data sets X, Y and Z. Each data set defines the frequency of an event occurring. For example: Data Set X: E1:4, E2:0, E3:10, E4:5, E5:0, E6:0 and so on.. Data Set Y: E1:2, E2:3, E3:7, E4:6, E5:0, E6:0 and so on.. Data Set Z: E1:0,…
PS1
  • 215
7
votes
2 answers

textbook example of KL Divergence

I have read what KL Divergence is about: assess differences in probability distributions between two sets. I have also read, and digested, that it is emphatically not a true metric because of asymmetry. Now I have been wanting to ask: I cannot…
cgo
  • 9,107
6
votes
1 answer

Estimate the Kullback-Leibler divergence

I would like to be sure I am able to compute the KL divergence based on a sample. Assume the data come from a Gamma distribution with shape=1/.85 and scale=.85. set.seed(937) theta <- .85 x <- rgamma(1000, shape=1/theta, scale=theta) Based on that…
ocram
  • 21,851
6
votes
1 answer

Kullback–Leibler divergence between two Wishart distributions

The result is shown in: [1] W.D. Penny, KL-Divergences of Normal, Gamma, Dirichlet, and Wishart densities, Available at: www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps But could anyone help me out to understand the lines on top of page…
4
votes
2 answers

KL-Divergence and the chain rule

I was trying to understand the mathematical proof of KL-Divergence when using the chain rule: $D(p(x,y)||q(x,y)) = D(p(x)||q(x)) + D(p(y|x)||q(y|x))$ And I'm a bit lost in the last step…
kuonb
  • 143
4
votes
1 answer

KL Divergence between parallel lines

I am trying to understand the example described in the WGAN paper about learning parallel lines with various divergences. More specifically the setup is as follows: Let $Z \sim [0, 1]$ the uniform distribution on the unit interval. Let…
3
votes
1 answer

KL Divergence with different domains

I want to calculate KL Divergence between a normal and an exponential r.v. i.e. $$D(P||Q) = ?\\ \;\; P=N(\mu,\sigma), \;\; Q=exp(\lambda)$$ My problem is that in this case the domains of the distributions are different - the domain of $P$ is $x\in…
3
votes
1 answer

Kullback–Leibler divergence when one measure is a sum of diracs

In the book "Deep Learning" of Goodfellow, Bengio and Courville, section 5.5 of maximum likelihood estimation they explain a relation between the maximization of likelihood and minimization of the K-L divergence. My question is on the formal…
2
votes
1 answer

Lower bound KL distance of non linear transform of a Gaussian with a family of mean zero Gaussian

Let $X \sim \mathcal{N}(0, \sigma_x^2)$ and let $f :\mathbb{R} \to \mathbb{R}$ be a smooth nonlinear transformation such that $\mathbb{E}[f(X)]=0$. I am wondering what kind of restrictions one can put on the function $f$ such that I can find a…
Abm
  • 362
2
votes
2 answers

KL-divergence between two products

Given factorizations of two joint densities $p(x_1,...,x_n)=\prod_{i=1}^n p(x_i\mid \textrm{cond}(x_i))$ and $q(x_1,...,x_n)=\prod_{i=1}^n q(x_i\mid \textrm{cond}(x_i))$, where $\textrm{cond}(\bullet)$ denotes the set of conditioning variables, does…
ASML
  • 148
1
vote
0 answers

Expectation of conditional KLs

I was reading how not to train your generative model paper. I don’t quite understand how the simplification from equation 4 to equation 5 can be right and based on my calculations it should be wrong. Would anyone mind giving me a hint? Specifically,…
1
2