Suppose that in a Bayesian framework we have observed data $D$, using independent prior distributions on the parameters of the model, denoted by $\theta_1, \theta_2$. Then, the joint posterior distribution of $\theta_1, \theta_2$ can be written as: $$ P(\theta_1, \theta_2 \mid D) \propto P(D \mid \theta_1, \theta_2)p(\theta_1)p(\theta_2) $$ One way to obtain estimates from the posterior is use Gibbs Sampling. This sampler requires simulating from the full conditional distributions for both $\theta_1$, $\theta_2$. One relation for the full conditional that caught my eye in a book is the following: $$ P(\theta_1\mid D, \theta_2) \propto P(D \mid \theta_1, \theta_2)p(\theta_1) $$ I am wondering how exactly this proportional relation is derived. Would anyone have any ideas? It seems that there are extra assumptions here that are implicit.
1 Answers
The starting point is the proportionality symbol$$\propto$$as it takes different meanings in the few formulas in the question. In $$ P(\theta_1, \theta_2 \mid D) \propto P(D \mid \theta_1,\theta_2)p(\theta_1)p(\theta_2) $$ the function of $(\theta_1, \theta_2)$ on the l.h.s. is proportional to the function of $(\theta_1, \theta_2)$ on the r.h.s., meaning that they differ by a multiplicative constant $\kappa$. This is a constant w.r.t. the functions of $(\theta_1, \theta_2)$, meaning that $\kappa$ may depend on (and hence be a function of) other factors, like $D$: $$\kappa=\kappa(D)$$ In fact, since $P(\theta_1, \theta_2 \mid D)$ is a probability density, this constant $\kappa$ is uniquely defined as the normalisation of the r.h.s. that turns it into a probability density, with total mass equal to one: $$\kappa=\kappa(D)=1\Big/\int_\Theta P(D \mid \theta_1,\theta_2)p(\theta_1)p(\theta_2)\,\text{d}\lambda(\theta_1,\theta_2)$$ for the appropriate measure $\text{d}\lambda$.
In $$P(\theta_1\mid D, \theta_2) \propto P(D \mid \theta_1, \theta_2)p(\theta_1)\qquad\qquad(1)$$ the function of $\theta_1$ on the l.h.s. is proportional to the function of $\theta_1$ on the r.h.s., meaning that $\kappa$ may depend on (and hence be a function of) other factors, like $D$ and now $\theta_2$: $$\kappa=\kappa(D,\theta_2)$$ Note that $$ P(\theta_1 \mid \theta_2, D) \propto P(D \mid \theta_1,\theta_2)p(\theta_1)p(\theta_2) $$ is equally correct.
That the result (1) holds follows from the standard definition of the conditional density \begin{align*} P(\theta_1\mid D, \theta_2) &= \dfrac{P(D \mid \theta_1, \theta_2)p(\theta_1)p(\theta_2)}{\int_{\Theta_1} P(D \mid \theta_1, \theta_2)p(\theta_1)p(\theta_2)\,\text{d}\lambda_1(\theta_1)}\\ &= \dfrac{P(D \mid \theta_1, \theta_2)p(\theta_1)\times p(\theta_2)}{\left\{\int_{\Theta_1} P(D \mid \theta_1, \theta_2)p(\theta_1)\,\text{d}\lambda_1(\theta_1)\right\}\times p(\theta_2)}\\ &= \dfrac{P(D \mid \theta_1, \theta_2)p(\theta_1)}{\int_{\Theta_1} P(D \mid \theta_1, \theta_2)p(\theta_1)\,\text{d}\lambda_1(\theta_1)}\\ &= \dfrac{P(D \mid \theta_1, \theta_2)p(\theta_1)}{\kappa(D,\theta_2)}\\ &\propto P(D \mid \theta_1, \theta_2)p(\theta_1) \end{align*}
- 105,342
-
Thank you, this is really clear and helped clear up some of my misconceptions! – user321627 Apr 02 '18 at 02:38