The normalising constant $Z$ in baye's theorem is the probability that the model generates the data $D$. $$\begin{align}P(D) &= \int P(D|\theta)P(\theta)d\theta \\ &= E_{\theta \sim p(\theta)}[P(D|\theta)] \\ &= \frac{1}{N}\sum_{i=1}^NP(D|\theta)\end{align}$$ We can sample $\theta$ from the prior distribution and average the likelihood of $P(D|\theta)$ over many samples to estimate $Z$. Can we use this method to approximate $Z$ ? Once we have $Z$ we have the whole posterior distribution $p(\theta|D)$ already ?
1 Answers
Monte Carlo methods are primarily intended to approximate integrals so the answer to the question is yes, we can use Monte Carlo to find an approximation of the normalising constant aka marginal likelihood $$m(D) = \int p(D|\theta)\pi(\theta)\,\text{d}\theta$$ For instance the book by Chen, Shao and Ibrahim (2000) Monte Carlo Methods in Bayesian Computation is concentrating on this problem. There are also many answers on Stack Exchange - X validated discussing the issue: see e.g. here. Simulating from the prior (assuming it is proper) is a possibility if not the most efficient one.
But finding an approximation to this marginal likelihood has no impact on the understanding and exploitation of the posterior distribution, which is why for instance MCMC methods operate without this constant. Using $p(\theta|D)$ with an estimated $m(D)$ does not help in the least: for instance, using the posterior cdf with the approximation would result in a cdf taking value between 0 and a maximum different from $1$, preventing the use of the inverse cdf method. See here and there and there for SE-XV entries on the disconnection between posterior simulation and normalising constant. And certainly the mother of all questions about posterior vs normalising constant: Why is it necessary to sample from the posterior distribution if we already KNOW the posterior distribution?
- 105,342
-
If we are able to monte carlo estimate $m(D)$, then we are able to compute the whole posterior distribution $p(\theta|D)$ since we have the closed forms for $p(D|\theta)p(\theta)$ ? – calveeen Aug 11 '20 at 11:36
-
Please check the question Why is it necessary to sample from the posterior distribution if we already KNOW the posterior distribution? and its answers. – Xi'an Aug 11 '20 at 12:14