6

When reading papers on machine learning, I have found that authors would often reference the "Shannon entropy". Curiously, often times the equation given would be:

$$H(p) = -\sum\limits_{i = 1}^n p_i \ln(p_i)$$

For instance, see:

https://arxiv.org/pdf/1502.00326.pdf

https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2014-121.pdf

There are a lot more

The problem is that for anyone who has ever taken a course on information theory, the logarithmic term in the entropy definition is base $2$, not base $e$. So they are referring to some more like Gibbs entropy instead of Shannon entropy.

Whereas the definition in this paper is correct to me: http://www.fizyka.umk.pl/publications/kmk/08-Entropie.pdf

Has anyone else noticed this phenomenon? Would there be a problem if one used Gibbs entropy in place of Shannon's entropy?

Olórin
  • 724

1 Answers1

10

It's not a problem. In fact Shannon himself suggested that other units could be used, see in his paper "A Mathematical Theory of Communication" the very first equation (bottom of page 1). Here's a quote from the paper:

In analytical work where integration and differentiation are involved the base e is sometimes useful. The resulting units of information will be called natural units. Change from the base a to base b merely requires multiplication by $\log_b a$.

Aksakal
  • 61,310