what is the meaning or intuition of entropy (from the point of view of reinforcement learning)

Question

Can someone give an intuition of the concept 'entropy'?

I am reading maximum entropy inverse reinforcement learning and I wanted to ask what the meaning intuition of 'entropy' is.

I understand that in this learning framework, entropy is defined as the product of the probability and the log of the probability. There are some derivations going on and the resulting equation leads to the probability is directly proportional to the exponential of the reward.

I understand the maths involved, but I STILL do not understand what ENTROPY is.

Insights welcome.

https://stats.stackexchange.com/questions/327162/why-does-the-shannon-index-take-the-log-of-number-and-then-multiply-the-number-b/371723#371723 — Sycorax, Oct 22 '18 at 03:10
See also https://stats.stackexchange.com/questions/66186/statistical-interpretation-of-maximum-entropy-distribution/245198#245198 and http://ilab.usc.edu/surprise/ — kjetil b halvorsen, Aug 10 '19 at 20:50

score 3 · Answer 1 · edited Oct 23 '18 at 09:33

I tend to look at it this way:

As you said, the entropy of a random variable $X$ is defined as $$H(X) = -\sum_{x\in X} \log p(x)\ p(x).$$ So it is maximal when $p(x)$ is uniformly distributed, i.e. $p(x) = \frac{1}{\vert X \vert}$ for all $x \in X$. Looking at a policy $\pi$ this becomes $$H(\pi) = -\sum_{s \in S}\sum_{a\in A} \log \pi(a|s)\ \pi(a|s).$$ Note that during learning you can approximate this with Monte Carlo Sampling, i.e. look at a batch of data points $(s,a)$ gathered during learning.

The intuition of the entropy of a policy in reinforcement learning is how certain we are about choosing a particular action $a$. In other words it tells us how much uncertainty over $a$ one can reduce by taking $s$ in to account.

This also tackles the exploration/exploitation dilemma in RL. On one hand we want to exploit well performing actions and on the other hand we want to explore the action space further. There could still be an action that leads to a higher reward. Entropy is a fitting measure for exploration, as a maximum entropy policy has maximum exploration but zero exploitation and vice versa.

I prefer to explain entropy as the probability-weighted average of $\log (1/p)$ thus as $\sum p\ \log(1/p)$ given that $\sum p = 1$. Most people prefer immediately to rewrite this first as $\sum p\ (-\log p)$ and then as $-\sum p\ \log p$ but although the rewriting may seem algebraically trivial it obscures the meaning slightly. Given familiarity with weighted averages the challenge is then to explain why we should work with $\log (1/p)$ as a measure of uncertainty. (Also, at some point there may be need to explain that $0\ \log 0 = 0$.) — Nick Cox, Oct 23 '18 at 09:37

what is the meaning or intuition of entropy (from the point of view of reinforcement learning)

1 Answers1