I tend to look at it this way:
As you said, the entropy of a random variable $X$ is defined as
$$H(X) = -\sum_{x\in X} \log p(x)\ p(x).$$
So it is maximal when $p(x)$ is uniformly distributed, i.e. $p(x) = \frac{1}{\vert X \vert}$ for all $x \in X$. Looking at a policy $\pi$ this becomes
$$H(\pi) = -\sum_{s \in S}\sum_{a\in A} \log \pi(a|s)\ \pi(a|s).$$
Note that during learning you can approximate this with Monte Carlo Sampling, i.e. look at a batch of data points $(s,a)$ gathered during learning.
The intuition of the entropy of a policy in reinforcement learning is how certain we are about choosing a particular action $a$. In other words it tells us how much uncertainty over $a$ one can reduce by taking $s$ in to account.
This also tackles the exploration/exploitation dilemma in RL. On one hand we want to exploit well performing actions and on the other hand we want to explore the action space further. There could still be an action that leads to a higher reward. Entropy is a fitting measure for exploration, as a maximum entropy policy has maximum exploration but zero exploitation and vice versa.