Sampling in Hierarchical Bayesian Models

Question

I have a question regarding sampling proceedures in a hierarchical latent variable model. Suppose a hierarchical model of the following form, given that $Y$ are observations, and $\theta,\phi$ are parameters and hyperparameters respectively. The joint distribution factorizes as:

$$p(Y,\theta,\phi)=p(Y|\theta)p(\theta|\phi)p(\phi)$$

Thus, the joint posterior distribution is given by:

$$p(\theta,\phi|Y)=\frac{p(Y|\theta)p(\theta|\phi)p(\phi)}{p(Y)}$$

while (let me call it in this incorrectly way) the marginal posteriors are given by:

$$p(\phi|Y,\theta)=\frac{p(\theta|\phi)p(\theta)}{p(\phi)}$$ $$p(\theta|Y,\phi)=\frac{p(Y|\theta)p(\theta|\phi)}{p(Y)}$$

A typical approach for generating samples from the posterior over the parameters and hyperparameters is using MCMC. I assume that everything is differentiable and that we will be using Hamiltonian Monte Carlo or Hybrir Monte Carlo.

In this setting one can use conjugate distributions for $p(\phi)$ and $p(\theta|\phi)$ so that sampling is done in the following way.

Initialize $\phi=\phi_0,\theta=\theta_0$
sample $\phi_1 \sim \phi|Y,\theta_0$ using Gibbs sampling
sample $\theta_0\sim\theta|Y,\phi_1$ using Hamiltonian Monte Carlo
Repeat 2 and 3

This is the procedure described in Radford Neal's PhD. However, I have notice an alternative method and would like to get an insight if there is any difference. For this point lets assume that we do not select the prior and the hyperprior to be conjugate, thus we cannot gain nothing from using Gibbs sampler and thus we have to employ HMC also to sample hyperparaemters. My first question is:

Is it a big difference in drawing sampling using HMC directly on the joint posterior $p(\theta,\phi|Y)$ or doing it alternatively by drawing samples using HMC from each of the marginal posteriors $p(\phi|\theta,Y)$ $p(\theta|\phi,Y)$?.

I am thinking that there is a slight difference, but I do not know the implications. From a practical point of view I think both approaches perform the same computations. Let me explain, if we wish to sample from the joint posterior using HMC we will need to compute gradients from:

$$p(\theta,\phi|Y)\propto p(Y|\theta)p(\theta|\phi)p(\phi)$$

using the leapfrog integrator, and with two initial values $\phi_0,\theta_0$, in order to update the momentum variables. The thing is that the gradient w.r.t $\phi$ would not be influenced by $p(Y|\theta)$, and the same applies for $\theta$, not being influenced by $p(\phi)$. This means that the computations involved (the gradients with value different to zero) are equivalent for both approaches: 1)when sampling from the joint posterior, 2) when sampling from the marginal posteriors in an alternative way.

So the only difference between both approaches is that running HMC over the joint posteriors use updated versions of $\theta,\phi$ in each loop of the leapfrog integrator, while doing HMC over the marginal posteriors would fix one of the parameter while the other is being updated during the leapfrog loop.

Any insights please on both approaches?

Thank you.

Sampling in Hierarchical Bayesian Models

0 Answers0