What does multiplying a probability by the log of the probability accomplish?

Question

I'm studying the probability aspect behind entropy and feature selection and I've noticed that both of the formulas I'm studying multiply a probability by the log of a probability. I'm not sure why that is. I think it might be to scale the formula so that output falls between 0 and 1, but I'm not sure if that's right. These are the two formulas I'm studying.

Entropy:

$ H(x) = -∑ P(x_i) · Log_2(P(x_i)) $

This formula to aid in model feature selection:

$ I(i) = ∫∫ P(x_i,y) Log(\frac{P(x_i,y)} {P(x_i) · P(y)}) dxdy $

http://math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf — Sycorax, Jul 28 '17 at 15:24

score 5 · Accepted Answer · edited Jul 28 '17 at 16:38

5

The basic reason of why logs appear in the entropy definition is to make it additive. For example, if you throw a die with six faces, you entropy would be log(6).

If you throw two dice, now you have 36 possibilities and the entropy would be

log(36) = 2 * log(6)

So, the entropy is linear in the size of the system, thanks to the logs.

Anyway, there is more here: What is the role of the logarithm in Shannon's entropy?

edited Jul 28 '17 at 16:38

Kodiologist

20,116

answered Jul 28 '17 at 15:31

AndreaL

521

2

The fact that entropy is additive is only one of the four requirements for an information function: https://en.wikipedia.org/wiki/Entropy_(information_theory)#Rationale – Alex R. Jul 28 '17 at 18:12
1

Yes, that's right. I kept my answer short since there are much more detailed answers to the question I have mentioned above. – AndreaL Jul 29 '17 at 13:00

What does multiplying a probability by the log of the probability accomplish?

1 Answers1