Understanding shannon entropy and computation with scipy.stats.entropy

Question

I am trying to understand the shannon entropy better. By definition, the shannon entropy is calculated as H = -sum(pk * log(pk)).

I am using the scipy.stats.entropy formula and I am running the calculation against a constant signal and a high variance signal.

However, the constant signal gives me an entropy of 2.7 which I completely don't understand. For the high variance signal, I get 1.76, which is lower. Shouldn't it be higher because you need more bits of information to quantify the signal? Where am I wrong here?

Here is code I am using:

from scipy.stats import entropy
import numpy as np
a = [5,5,5,6,5,5,1,6,5,5,5,9,5,99,5]
entropy(a)
# 1.7693542254395482
b = np.ones(15)
b= b * 5
entropy(b)
# 2.70805020110221

Entropy is a function of probabilities, not a direct function of observed values. scipy.stats.entropy takes probabilities for each of the possible values of $x$ as inputs, not the observed values. — Marjolein Fokkema, Nov 03 '23 at 11:34
Then I have two questions. What does the 2.7 and 1.7 do actually mean? and what is the kind of function I am looking for? — GGChe, Nov 03 '23 at 11:36

Marjolein Fokkema · Accepted Answer · 2023-11-03T12:19:28.547

Entropy is a function of probabilities, not a direct function of observed values. scipy.stats.entropy takes probabilities for each of the possible values of x as inputs, not the observed values.

Entropy (wikipedia): $\mathrm {H} (X):=-\sum _{x\in {\mathcal {X}}}p(x)\log p(x)$

As per the documentation of scipy.stats.entropy: "This routine will normalize pk and qk if they don’t sum to 1."

You supplied two vectors of observed values, which will be interpreted as vectors of probabilities, each one will be normalized to sum to one and then entropy will be computed as per the formula above.

Computations and results in R:

> a <- c(5,5,5,6,5,5,1,6,5,5,5,9,5,99,5)
> b <- rep(1, 15)
> 
> normalized_a <- sapply(a, function(x) x/(sum(a)))
> normalized_b <- sapply(b, function(x) x/(sum(b)))
> round(normalized_a, digits = 3)
 [1] 0.029 0.029 0.029 0.035 0.029 0.029 0.006 0.035 0.029
[10] 0.029 0.029 0.053 0.029 0.579 0.029
> round(normalized_b, digits = 3)
 [1] 0.067 0.067 0.067 0.067 0.067 0.067 0.067 0.067 0.067 0.067
> 
> -sum(sapply(normalized_a, \(x) x*log(x)))
[1] 1.769354
> -sum(sapply(normalized_b, \(x) x*log(x)))
[1] 2.70805

Your vector b has all values identical, suggesting that every possible value is equally likely. This suggests maximum surprise, therefore maximum entropy. But note this is only because you supplied observed values, instead of the relative frequency with which each of these values was observed.

Your vector a has only some values identical, and one very large value (99), which will be interpreted as a very high probability. Thus, entropy will be lower. But note this is only because you supplied observed values, instead of the relative frequency with which each of these values was observed.

Thanks for the answer. I have some question. What does these values actually mean? It makes no sense to me that the entropy of the vector with only 1s is higher than the other vector — GGChe, Nov 03 '23 at 12:05
Conceptually, Entropy is the "amount of missing information." Missing of what? Well, the amount of information needed to come to "certainty," which has a probability distribution of, for example, (0,0,1,0,...). Or when all possibilities are zero, except for one which has become the certain possibility. A vector with only 1s, becomes normalized by your software to (1/n,1/n,1/n,...), or the uniform distribution. And the uniform distribution has the maximum amount of missing information. — Romke Bontekoe, Nov 03 '23 at 12:17
Oookay! Now I get it. Great. Then, which is the analysis method I am looking for? I want to get dispersion of a signal, or the number of bites required for each of the samples to be quantized. If I am not wrong, they are related to each other. — GGChe, Nov 03 '23 at 16:17
For "What is entropy" see https://stats.stackexchange.com/questions/66186/statistical-interpretation-of-maximum-entropy-distribution/245198#245198 — kjetil b halvorsen, Nov 07 '23 at 02:37

Understanding shannon entropy and computation with scipy.stats.entropy

1 Answers1