In information theory, specifically in information content, I am struggling to conceptualise what the unit of measurement actually is. I have read quite a few similar questions and worked through the derivation in Probability Theory: The Logic of Science, E.T Jaynes but still struggling to piece together the information.
The part I am struggling with is why the information content takes the form.
$$ I_X(x) = log\biggl(\frac{1}{p_X(x)}\biggr)$$
From what I have read so far, the use of the logarithm function makes sense, but $\bigl(\frac{1}{p_X(x)}\bigr)$ is where I am stuck. In plain english, I understand this to mean the quantity of information transmitted, or level of surprise when some event occurs. But I don't follow why this would takes this form. One idea I have trying to confirm (unsuccessfully so far) is that if that if some event $E$ occurs with probability $P$, then the amount of information received when this event occurs could be considered as $1 - P$.
If this is true, while also satisfying the requirements outlined by Shannon, is information content defined as above due to the following?
$$ I_X(x) = log\bigl(1\bigr) - log\bigl(p_X(x)\bigr)$$ $$ I_X(x) = 0 - log\bigl(p_X(x)\bigr)$$ or $$ I_X(x) = log\biggl(\frac{1}{p_X(x)}\biggr)$$
I'm not sure if this is just assumed as being obvious, or a coincidence, but I haven't yet found an explanation that formally describes information content in this way (i.e. why information is $\frac{1}{p_X(x)}$).