I'm studying the probability aspect behind entropy and feature selection and I've noticed that both of the formulas I'm studying multiply a probability by the log of a probability. I'm not sure why that is. I think it might be to scale the formula so that output falls between 0 and 1, but I'm not sure if that's right. These are the two formulas I'm studying.
Entropy:
$ H(x) = -∑ P(x_i) · Log_2(P(x_i)) $
This formula to aid in model feature selection:
$ I(i) = ∫∫ P(x_i,y) Log(\frac{P(x_i,y)} {P(x_i) · P(y)}) dxdy $