11

I'm making some self-study notes for Markov chain Monte Carlo (MCMC) and want to check my understanding before proceeding. After reading a few papers and tutorials this is what I've synthesised:

What

  • We can't directly evaluate the posterior as the normalising constant is too hard to calculate for interesting problems

  • Instead we sample from it

  • We do this by engineering a Markov chain that has the same stationary distribution as the target distribution (the posterior in our case)

  • When we have reached this stationary state we continue to run the Markov chain and sample from it to build up our empirical distribution of the posterior

How

  • All Markov chains are completely described by their transition probabilities

  • We therefore control/engineer the Markov chain by controlling the transition probabilities

  • All MCMC algorithms work from this principal but the exact method for generating these transition probabilites differs from algorithm to algorithm

  • If we have a particular algorithm for generating these transition probabilities, we can verify that it converges to the stationary distribution by using the detailed balance equation on the proposed transition probabilities

  • Thus the remaining challenge is to come up with a method to generate these transition probabilities

RNs_Ghost
  • 864
  • (+1) I’m also interested in this. I’m guessing that the sample mean of the samples obtained from the MCMC process can be used to estimate the posterior mean, which is the minimum mean squared-error estimate of a parameter. – mhdadk Mar 05 '23 at 15:41
  • Or, more generally, we can construct an estimator that is a function of the samples obtained for different moments of the posterior distribution. – mhdadk Mar 05 '23 at 15:42

1 Answers1

15

My comments on these assertions:

We can't directly evaluate the posterior as the normalising constant is too hard to calculate for interesting problems. Instead we sample from it.

No, the normalising constant$$\in_\Theta \pi(\theta)f(x|\theta)\,\text d\theta$$being unknown is not the issue for being unable to handle inference from the posterior distribution. The complexity of the posterior density is the primary reason for running simulations. (The normalising constant is mostly useful to compute the evidence in Bayesian hypothesis testing.)

We do this by engineering a Markov chain that has the same stationary distribution as the target distribution (the posterior in our case)

This is correct (if one possibility). Note that MCMC is a general simulation method that is not restricted to Bayesian computation.

When we have reached this stationary state we continue to run the Markov chain and sample from it to build up our empirical distribution of the posterior

Not exactly as "reaching stationarity" is most often impossible to detect/assert in practice. Some techniques exist, but they are not exact and mileage [varies][5]. Exact (or perfect) sampling is restricted to some ordered settings and very costly. However, the ergodic theorem validates the use of Monte Carlo averages in this setting without "waiting" for stationarity.

All Markov chains are completely described by their transition probabilities.

The generic term is transition kernel, as the target distribution often is absolutely continuous. Some MCMC methods use continuous time processes, in which case there is no transition kernel stricto sensus.

We therefore control/engineer the Markov chain by controlling the transition probabilities. All MCMC algorithms work from this principle but the exact method for generating these transition probabilities differs between algorithms.

Markov chain Monte Carlo algorithm are indeed validated by the fact that their transition kernel ensures stationarity for the target distribution$$\pi(\theta'|x) = \int_\Theta \pi(\theta|x)K(\theta,\theta')\,\text d\theta\tag{1}$$

If we have a particular algorithm for generating these transition probabilities, we can verify that it converges to the stationary distribution by using the detailed balance equation on the proposed transition probabilities

No, detailed balance is not a necessary condition for stationarity wrt the correct target. Take for instance the Gibbs samplers or the Langevin version (MALA), which are usually not reversible and hence do not check detailed balance. They are nonetheless valid and satisfy global balance (1).

Thus the remaining challenge is to come up with a method to generate these transition probabilities

Not really, since there exist families of generic MCMC algorithms such as random walk Metropolis-Hastings algorithms or Hamiltonian Monte Carlo. The challenge is more into calibrating a given algorithm or choosing between algorithms.

Xi'an
  • 105,342
  • Thanks for this. I was under the impression that detailed balance was a sufficient but not necessary condition? Also on the last point, I guess I'm talking about if we were starting from scratch, not that we're in 2023 with millions of algorithms to choose from haha – RNs_Ghost Mar 05 '23 at 20:29
  • 1
    (?) This is what I wrote: detailed balance is sufficient but not necessary. – Xi'an Mar 06 '23 at 07:34
  • 2
    While the normalising constant is not the issue, it is true that the normalising constant does not need to be known to perform MCMC. So possibly that might be where the idea came form. – Sextus Empiricus Mar 06 '23 at 09:48