Agree with the top answer that cross-entropy, as a distribution dissimilarity measure, may be more narrowly applicable to situations when we are comparing an estimated probability distribution (q) against the true probability distribution (p).
If we only consider the true probability distribution p, its entropy is the expected value of the corresponding log-probability (also could be viewed as the amount of "self-information": That is, the number of bits required to encode the information):
$$E_{p}(log_2(p)) = -\Sigma_{i=1}^{n}p_{i}log_{2}(p_{i})$$
When computing cross-entropy between the estimated distribution q and true p, we replace the log-probability $log_2(p)$ with the estimated counterpart $log_2(q)$. After re-arranging the equation we can the following:
\begin{equation}
\begin{aligned}
E_{p}(log_2(q)) &= -\Sigma_{i=1}^{n}p_{i}log_{2}(q_{i})\\
&= -(\Sigma_{i=1}^{n}p_{i}log_{2}(\frac{q_{i}}{p_{i}}p_{i}))\\
&= -\Sigma_{i=1}^{n}p_{i}log_{2}(p_{i}) - \Sigma_{i=1}^{n}p_{i}log_{2}(\frac{q_{i}}{p_{i}})\\
&= entropy(p) + KLdivergence(p||q)
\end{aligned}
\end{equation}
In this way, the extra KL divergence term is a relative entropy measure that describes how many extra bits we would need when replacing the true distribution p with the estimated distribution q. The asymmetry is also visible (as mentioned in other answers above):
$$-\Sigma_{i=1}^{n}p_{i}log_{2}(\frac{q_{i}}{p_{i}}) \neq -\Sigma_{i=1}^{n}q_{i}log_{2}(\frac{p_{i}}{q_{i}})$$
In the context of machine learning, cross-entropy is a commonly used loss function and by minimizing it we learn the model parameters.
When coming to comparing two distributions in a broader sense, you might be looking for metrics such as:
- The asymmetric Kullback-Leibler divergence: scipy.special.kl_div
- The symmetric Jensen-Shannon divergence: scipy.special.distance.jensenshannon