0

I have been trying to understand and implement KL-divergence for two normal distributions. However, one thing that I seem to be missing is how can KL-divergence always be a non-negative value, if the log likelihood ratio can be negative?

Let's say we have two pdf's p(x) and q(x) similarly as in https://medium.com/@cotra.marko/making-sense-of-the-kullback-leibler-kl-divergence-b0d57ee10e0a

The formula for KLD is: $D_{KL}(p(x) || q(x)) = \int_x p(x) log (\frac{p(x)}{q(x)})dx$

so if $p(x) < q(x)$, the ratio value will be: $\frac{p(x)}{q(x)}<1$, and thus: $log (\frac{p(x)}{q(x)}) < 0$. If this holds for most of the support then $D_{KL}$ will end up also being < 1.

Is there some fundamental part I am missing?

El Rakone
  • 1
  • 2

1 Answers1

1

i think the missing part is:

"It’s a measure of how much “predictive power” or “evidence” each sample will on average bring when you’re trying to distinguish p(x) from q(x), if you’re sampling from p(x). "

Since $x$ is sampled from $p(x)$ distribution, then following will be true in average/expectation:

$p(x)>=q(x)$.

Also resource at https://nowak.ece.wisc.edu/ece830/ece830_fall11_lecture7.pdf explains how to derive $D_{KL}(p(x)||q(x))$ by using Jenking's Inequality and

$D_{KL}(p(x)||q(x))=-E_{p}[log(\frac{q(x)}{p(x)}]$

Deno
  • 29