I am looking for methods that allow estimating the information entropy of a distribution when the only practical ways of sampling from that distribution are Monte Carlo methods.
My problem is not unlike the standard Ising model that is typically used as the introductory example for Metropolis–Hastings sampling. I have a probability distribution over a set $A$, i.e. I have $p(a)$ for each $a \in A$. The elements $a \in A$ are of a combinatorial nature, like Ising states, and there is a very high number of them. This means that in practice I never get the same sample twice when sampling from this distribution on a computer. $p(a)$ cannot be directly computed (due to not knowing the normalization factor), but the ratio $p(a_1)/p(a_2)$ is easy to calculate.
I want to estimate the information entropy of this distribution, $$ S = -\sum_{a \in A} p(a) \ln p(a). $$
Alternatively, I want to estimate the entropy difference between this distribution and one obtained by restricting it to a subset $a\in A_1 \subset A$ (and of course re-normalizing).