5

I am looking to Shannon index formula in diversity. Part of the formula I am having trouble following. For example, 50 foxes at site 1, 60 foxes site 2 and 100 foxes site 3. Across all sites there are 210 foxes.

50 / 210 = 0.23809. Then get that log (0.23809)

60/ 210 = 0.28571. Log (0.28571)

100/210 = 0.47619. Log (0.47619)

But the formula goes on to multiply them together: $p\cdot \log(p)$, viz., $0.23809\times \log(0.23809)$, and so on for the others. It adds up the total for each together. Using the formula as context, I want to know what $p\cdot \log(p)$ does in statistics? That is, why multiply the number, e.g., 0.23809 by the log of the number? It’s not the formula that’s the problem - it’s multiplying the the number by its log. Is that a usual thing in logs? What is the aim / reason of it? IF I were to multiply it by 100/1 I would get the proposition. But why multiply a number by the log of the same number?

cara
  • 51

1 Answers1

4

Your question describes Shannon entropy. It originates in C.E. Shannon, "A Mathematical Theory of Communication" The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656, July, October, 1948.

Don't be turned off by the date; because it was written in a time when clear communication, instead of technical obscurantism, was valued in publications, the paper is quite readable.

Sycorax
  • 90,934