How do MCMC methods allow the estimation of the posterior distribution in this example?

Question

I am reading a book example (diagram from p10) in which a person scores 9/10 on which we assumed a uniform prior. The posterior distribution could be easily worked out analytically, but the book gives an example of estimating the posterior distribution by MCMC methods. I can see from the bottom part of the diagram that about about 95% of samples had a theta of between 0.59 and 0.98, where theta represents 'ability' in the sense of accuracy rate on these tests. However, I don't understand how a theta value is obtained for each sample in the chain, and what the relationship is between a sample in a chain and the proceeding sample. The book also does not explain why three chains are chosen or where the initial values for these chains come from.

enter image description here

How MCMC works in general is asked/answered in http://stats.stackexchange.com/questions/73629/ — Juho Kokkala, Jun 23 '14 at 16:43
Furthermore, note that based on box 2.2 on page 25, the book in question does not attempt to explain how/why MCMC works, but you need read other sources for that. If after consulting other sources you have some specific questions about this particular example, please edit the question to address those specific concerns. In my opinion, the question as it now stands is essentially 'how MCMC works' which is pretty broad and/or asked already. — Juho Kokkala, Jun 23 '14 at 17:20
I'd read the Kruschke reference they suggested as the best one to start with, and hadn't been able to follow it sufficiently to understand the diagram, and so I posted my question here instead of moving onto the references they describe as "more technical". I'll edit the question to make it clearer that I'm not simply asking how MCMC works. — user1205901 - Слава Україні, Jun 24 '14 at 00:30

shadowtalker · Accepted Answer · 2014-06-24T03:02:03.907

MCMC works like this:

You need to draw independent samples from a distribution, but it's computationally intractable to do so. It might not even be easy to draw dependent samples.

Your next best approach is to cleverly construct up a Markov chain with the property that, if you run it for a very long time, you can expect it to take values that look like draws from the distribution you care about. That is MCMC. In some cases, you can make those draws approximately independent.

So in this case the $\theta$ draws are computed by initializing the Markov chain, "burning it in" for several thousand iterations to start moving towards limiting behavior, then just watching it bounce around. $\theta$ is the value of the chain at each sample. The relationship between samples is the transition relationship in the Markov chain.

So initial values don't matter. Three chains (as opposed to 5 or 8) is completely arbitrary. More is better, since chains are guaranteed to be independent from each other but not within themselves. There are post-hoc "effective sample size" calculations you can do but there isn't a good way I'm aware of to decide the number of chains and iterations ex ante. But many simpler MCMC problems are well-behaved enough that you don't need necessarily to run 10,000 iterations on 8 chains each.

I think the assertion that you can expect draws from a chain from an MCMC that has run for a long time to be approximately independent is rather strong (and often completely wrong). In some cases (RWMH for example), the sampler is deliberately not approximately independent (it could be made more independent but that would be counterproductive in that case). And in practice even Gibbs samplers may be quite far from independent even if you were sampling from the stationary distribution. — Glen_b, Jun 24 '14 at 01:43
True. I'll change it. That's the reason I like HMC / Stan, it seems to do a good job of achieving independence without a huge thinning interval. Non-independence BUGS me, so to speak. — shadowtalker, Jun 24 '14 at 02:59

How do MCMC methods allow the estimation of the posterior distribution in this example?

1 Answers1