How to understand units of information

Question

In information theory, specifically in information content, I am struggling to conceptualise what the unit of measurement actually is. I have read quite a few similar questions and worked through the derivation in Probability Theory: The Logic of Science, E.T Jaynes but still struggling to piece together the information.

The part I am struggling with is why the information content takes the form.

$$ I_X(x) = log\biggl(\frac{1}{p_X(x)}\biggr)$$

From what I have read so far, the use of the logarithm function makes sense, but $\bigl(\frac{1}{p_X(x)}\bigr)$ is where I am stuck. In plain english, I understand this to mean the quantity of information transmitted, or level of surprise when some event occurs. But I don't follow why this would takes this form. One idea I have trying to confirm (unsuccessfully so far) is that if that if some event $E$ occurs with probability $P$, then the amount of information received when this event occurs could be considered as $1 - P$.

If this is true, while also satisfying the requirements outlined by Shannon, is information content defined as above due to the following?

$$ I_X(x) = log\bigl(1\bigr) - log\bigl(p_X(x)\bigr)$$ $$ I_X(x) = 0 - log\bigl(p_X(x)\bigr)$$ or $$ I_X(x) = log\biggl(\frac{1}{p_X(x)}\biggr)$$

I'm not sure if this is just assumed as being obvious, or a coincidence, but I haven't yet found an explanation that formally describes information content in this way (i.e. why information is $\frac{1}{p_X(x)}$).

@kjetilbhalvorsen, thanks for your help. In your answer, where you say "we throw in a minus sign to get a positive number", is it just a coincidence that $log\biggl(\frac{1}{p_X(x)}\biggr) = log(1) - log(p_X(x)) = -log(p_X(x))$? Or do we end up with a minus sign because we start by saying the amount of information from sampling something with probability $P$ may be measured as $1 - P$? Then following the guidance of Shannon, this equates to $log(1) - log(log(p_X(x))$ or just $-log(p_X(x))$? — Josmoor98, Aug 29 '19 at 16:02

Cesare · Answer 1 · 2019-09-04T10:58:47.510

Suppose you want to explain to someone how long a football field is. You can grab a nearby object and tell this person that the football field is, for example, approximately 1000 times the length of the object. This is the essence of measuring distances.

Now suppose you want to explain someone how unlikely or hard to predict a random event with probability $P(X)$ is. You should proceed in a similar fashion. Choose a known random event that, like the "nearby" object in the example of the football field, will serve as a comparison. The simplest choice is for this "unit of measure" is probably the toss of a coin. If the person is not familiar with coin-tossing, you can easily show it to him/her.

An event with probability $P(X)$ then corresponds to $$ \log_{\frac{1}{2}} P(X) $$ tosses of coins. For example, the toss of two coins is twice as hard to predict as the toss of one coin. Similarly, any outcome when you roll a die ($P(X) = 1/6$) is approx 2.6 times harder to predict than the coin toss, etc.

From the properties of logarithm you have $$ \log_{\frac{1}{2}} P(X) = - \log_2 P(X) = \log_2 \frac{1}{P(X)} $$

So when you are computing surprisal in bits you are essentially "measuring" how hard a random event is to forecast compared to a coin toss.

How to understand units of information

1 Answers1