Bias in numerical estimation of Shannon Entropy

Question

I am doing a numerical estimation of $H(X_1,X_2, \ldots, X_n)$, where $X_i$ are some discrete random variables.

Using the definition of Shannon Entropy I know that

$$ H(X_1,X_2, \ldots, X_n) = -\sum_{x_1,x_2,\ldots,x_n} p(x_1,x_2,\ldots,x_n)*\log p(x_1,x_2,\ldots,x_n) $$.

My approach is to compute $p(x_1,x_2,\ldots,x_n)$ via frequency estimation.

My problem is that this approach systematically understimate $H$.

For example, if $X_i$ $(i = 1,2,\ldots,8)$ have uniform distribution with alphabet $\mathcal{X}$ and $\vert \mathcal{X} \vert = 13$, I have, theoretically $ H(X_1, X_2, \ldots, X_8) = 8\log(13) \approx 20.5196 $, but my estimate gives $H \approx 16.1097$. I have generated $m = 10^7$ random observations. I understand that this is a dimensionality problem, and increasing $m$ the bias become smaller, but dealing with real data I need a better estimator that maybe takes into account the number of observations $m$ and the dimensionality $n$ of the problem (with my data I cannot use regolarization methods such as Laplace smoothing). Do you know any better estimator? I would appreciate any advise to overcome the problem.

See How to derive the bias of an entropy estimate. Also this. — dipetkov, Aug 23 '23 at 18:34

Bias in numerical estimation of Shannon Entropy

0 Answers0