3

Suppose for a random variable $ X\colon \Omega \to E $, I have an invertible mapping $ Z = f(X) $.

Is the Shannon Entropy for each variable equivalent?

$$ H(X) = H(Z) $$

If not, can anything relationship between the entropies be said in general (eg. proportionality, additive constant, etc.)?

User1865345
  • 8,202

3 Answers3

7

Is the Shannon Entropy for each variable equivalent?

Yes. An intuitive way of seeing why this is so is that (deterministic) transformations cannot increase entropy.

Specifically, note that $$H(X) \geq H(f(X)) = H(Z),$$ because, by the chain rule of entropy:

$$ H(X, f(X)) = H(X) + H(f(X) | X) = H(X), $$

but also

$$ H(X, f(X)) = H(f(X)) + H(X | f(X)) \geq H(f(X)) . $$

However, letting $g = f^{-1}$ (which exists, as $f$ is invertible), also $$H(Z) \geq H(g(X)) = H(X).$$

Ami Tavory
  • 4,590
  • It appears that this answer is flawed, as pointed out in a new answer. For a linear transformation $f(x) = \mu + \sigma x$, it is easy to show that $H(f(X)) = \log\sigma + H(X)$, which contradicts this answer. I do not immediately see a flaw in your logic though -- do you have any thoughts on reconciling these contradictory claims? – knrumsey Feb 02 '24 at 19:39
  • @knrumsey - the flaw is that entropy is calculated based on the probability distribution, not the values of the random variable, so the fact that the transform of the random variable is invertible is irrelevant. – jbowman Feb 02 '24 at 19:51
  • @jbowman, I agree with you that the values of the random variable are irrelevant. I do not readily see how that relates to a flaw in the accepted answer, however. – knrumsey Feb 02 '24 at 20:02
  • @knrumsey - the accepted answer relies on the invertibility of a function on those values, but entropy isn't calculated using those values, it's calculated using the probabilities of those values. – jbowman Feb 02 '24 at 20:20
2

Actually I would disagree with Nick or Ami. Although I am not familiar with the chain rules of entropy, we can use a simple counter example. Suppose $X\sim N(0,\sigma^2)\in \mathbb{R}$. Then its entropy is given by (derivation given here) $$ H=\frac{1}{2}\log(2\pi)+\log(\sigma)+\frac{1}{2} $$ The entropy is dependent on $\sigma$, which can be modified by scaling. e.g. $Y=2X$ implies $\sigma_Y=2\sigma_X$, which yields $H(Y)-H(X)=\log 2\neq 0$

Edit: I realised that depending on whether X is discrete or continuous, the answer could be different. The first link by R Carnell illustrates this difference.

  • +1, a good catch! You might consider adding that, in general, if $Y = \mu + \sigma X$ then $H(Y) = \log\sigma + H(X)$. – knrumsey Feb 02 '24 at 19:41
  • 1
    This example arises only because you have used differential entropy instead of entropy itself: they are not the same thing! The general derivation of $H$ for a Normal distribution at https://stats.stackexchange.com/a/415436/919 capitalizes on this and locates the issue in the Jacobian of the transformation, which doesn't apply to entropy itself. – whuber Feb 02 '24 at 20:10
  • 1
    Thanks whuber for bringing my attention to this difference; I have not heard of differential entropy before, and have been using 'entropy' to refer to both the continuous and discrete case. – Ruobin Han Feb 02 '24 at 20:21
1

The answer given by Ami is correct, I will provide an intuitive explanation of why $H(X|Z) = H(Z|X) = 0$:

  1. $f(\cdot)$ is a deterministic mapping from $X$ to $Z$, so when you observe $X$, you have zero uncertainty to infer about $Z$, thus $H(Z|X) = 0$.
  2. Note that $Z=f(X)$ is invertible, so when you observe $Z$, you also have zero uncertainty to infer about $X$, that's to say, $H(X|Z)=0$.

Combining 1 and 2, it is clear that $H(X)=H(X,Z)-H(Z|X)=H(X,Z)$ and $H(Z)=H(X,Z)-H(X|Z)=H(X,Z)$, so $H(X)$ and $H(Z)$ are exactly the same.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
lynnjohn
  • 181