3

I'm studying the probability aspect behind entropy and feature selection and I've noticed that both of the formulas I'm studying multiply a probability by the log of a probability. I'm not sure why that is. I think it might be to scale the formula so that output falls between 0 and 1, but I'm not sure if that's right. These are the two formulas I'm studying.

Entropy:

$ H(x) = -∑ P(x_i) · Log_2(P(x_i)) $

This formula to aid in model feature selection:

$ I(i) = ∫∫ P(x_i,y) Log(\frac{P(x_i,y)} {P(x_i) · P(y)}) dxdy $

tuomastik
  • 680
Jarom
  • 251

1 Answers1

5

The basic reason of why logs appear in the entropy definition is to make it additive. For example, if you throw a die with six faces, you entropy would be log(6).

If you throw two dice, now you have 36 possibilities and the entropy would be

log(36) = 2 * log(6)

So, the entropy is linear in the size of the system, thanks to the logs.

Anyway, there is more here: What is the role of the logarithm in Shannon's entropy?

Kodiologist
  • 20,116
AndreaL
  • 521
  • 2
    The fact that entropy is additive is only one of the four requirements for an information function: https://en.wikipedia.org/wiki/Entropy_(information_theory)#Rationale – Alex R. Jul 28 '17 at 18:12
  • 1
    Yes, that's right. I kept my answer short since there are much more detailed answers to the question I have mentioned above. – AndreaL Jul 29 '17 at 13:00