5

What's the entropy of the following generalized probability distributions?

$P_1(x) = \delta(x)$

$P_2(x,y) = \delta(x+y)$, for $0\le x\le 1$, and $P_2(x,y)=0$ otherwise.

Integrals of the type $-\int \delta(x) \ln\delta(x) \mathrm{d}x$ seem to diverge to $-\infty$ (see here). But entropy is supposed to be positive. What's going on here? How can I compute the entropy of these distributions? Is there a way to define entropy for these distributions?

a06e
  • 4,410
  • 1
  • 22
  • 50
  • 2
    What is your basis for assuming entropy is positive? For instance, the entropy of a Gaussian (Normal) distribution with standard deviation $\sigma$ is $\frac{1}{2}\log(2\pi e) + \log(\sigma)$ which will be extremely negative for sufficiently small $\sigma$. Are you perhaps basing your question on a different definition of entropy? – whuber Mar 27 '14 at 15:29
  • 1
    Strictly speaking, you're not actually calculating the entropy here - you're calculating the differential entropy, which is importantly different (for example, it can be negative). And no, I know of no useful definition of differential entropy which can deal with delta functions without everything going infinite. – Pat Mar 27 '14 at 16:11
  • @whuber For discrete distributions the Gibb's inequality implies that entropy is non-negative. I thought the same applied for continuous distributions? – a06e Mar 27 '14 at 18:13
  • 3
    Entropy of continuous distributions behaves quite differently than that of discrete distributions, because it is defined in terms of probability densities rather than probabilities themselves. @Pat We can still make sense of (differential) entropy of delta functions; as intimated in the link in the question, it can be understood as the limiting entropy of a sequence of functions whose (compact) supports shrink to a point. Regardless of what functions are used, the entropy indeed drops to $-\infty$. This actually makes sense as "a value smaller than all real numbers." – whuber Mar 27 '14 at 18:38
  • 1
    @whuber $-\infty$ entropy can be interpreted then as infinite certainty. – a06e Mar 27 '14 at 19:10
  • That would be a good way of interpreting an atom at a point, wouldn't it? There's no uncertainty about what value it will have. – whuber Mar 28 '14 at 15:34
  • The "differential entropy" lacks some good properties including positivity when compared to its discrete cousin. Limiting density of discrete points could be a better way to describe uncertainty for continuous r.v. – Ziyuan Jun 29 '18 at 09:16

1 Answers1

9

Typical Shannon entropy, on discrete set of probabilities, needs to be positive, as it is average of non-negative numbers, i.e.

$$\sum_i p_i \left(\tfrac{1}{p_i}\right).$$

Differential entropy need not to be positive. It is

$$\int p(x) \log\left(\tfrac{1}{p(x)}\right) dx,$$

which does not need to be positive. $p(x)$ is probability density, so it can be greater than one, making $\log(\tfrac{1}{p(x)})$ negative. In fact differential entropy can be viewed as Shannon entropy, where we do limit for infinitesimally small boxes and subtract $\log(1/\epsilon)$ (i.e. box size), otherwise the limit diverges:

$$ \lim_{\epsilon\to\infty} \sum_i p_{[i\epsilon, (i+1)\epsilon]} \log\left(\tfrac{1}{p_{[i\epsilon, (i+1)\epsilon]}}\right) $$ $$ \approx \lim_{\epsilon\to\infty} \sum_{i} p(i \epsilon)\epsilon \log\left(\tfrac{1}{p(i \epsilon)\epsilon}\right) $$ $$ = \lim_{\epsilon\to\infty} \left(\sum_{i} p(i \epsilon)\epsilon \log\left(\tfrac{1}{p(i \epsilon)}\right) + \log(1/\epsilon) \right) $$ $$ = \int_x p(x) \log\left(\tfrac{1}{p(x)}\right) dx + \lim_{\epsilon\to\infty}\log(1/\epsilon) $$

For Dirac delta differential entropy is $-\infty$, so you are right.

Piotr Migdal
  • 5,776
  • (+1) Doesn't $\log(\epsilon)$ need to be subtracted, as you said, rather than added? Also, because this limiting argument lacks rigor, it does not apply directly to the case of delta functions--but it still gives a nice intuition concerning why entropy and differential entropy might have different properties. – whuber May 07 '14 at 18:07
  • 1
    I meant $\log(1/\epsilon)$ (rather than $\log(\epsilon)$) needs to be subtracted (thanks for spotting that, fixed). The limiting argument is hand-waved, but can be made rigorous (for continuous functions) with the intermediate value theorem. If you know how to make it simple yet more rigorous, I would appreciate edit. – Piotr Migdal May 07 '14 at 18:20
  • 5
    Assuming Riemann integration, yes the integral can be obtained in the limit, but the separation of one limit into two also requires some justification. (In fact, it is the very failure of this operation that explains some of the problems with differential entropy.) I have no complaint about the lack of rigor for the purpose of explaining why differential entropy can be negative. However, a bit of a paradox remains: $\delta(x)$, as a discrete distribution, has entropy $0$, whereas as the limit of continuous distributions with supports shrinking to $x$, its entropy diverges to $-\infty$. – whuber May 07 '14 at 18:40
  • @whuber What's the resolution? Which answer, $0$ or $-\infty$, does this general measure-theoretic definition yield? – user76284 Nov 17 '22 at 17:14
  • @user76284 Perhaps the wisest solution is to recognize that when you're trying to compute the differential entropy of a discrete distribution, you probably should be doing something else! – whuber Nov 17 '22 at 17:17
  • 1
    @whuber Actually, the definition I linked to does not seem to subsume differential entropy. It seems to me like the correct, most general definition of "entropy" is actually relative entropy $\int \mu \log \frac{\mu}{\nu}$. Notably, unlike plain "entropy", the argument to log is truly dimensionless. Thus the differential entropy is actually relative entropy in disguise, with respect to the Lebesgue measure. – user76284 Nov 17 '22 at 18:49
  • @user76284 Yes, that's a good perspective. – whuber Nov 17 '22 at 20:44