Theoretically, why do we not need to compute a marginal distribution constant for finding a Bayesian posterior?

Question

For most of my time in stats, I have been able to ignore the marginal distribution that is usually present at the denominator of any bayesian posterior distribution.

For example, if we write down $L_x(\theta)\pi(\theta)$ and recognize that this function of $\theta$ looks like a distribution of $\theta$ but with an incorrect normalizing constant, I usually just mix and match till I get it since

$$ \pi(x|\theta) \propto L_x(\theta)\pi(\theta) $$

HOWEVER,

why can I do this, are there any cases where this breaks down?

Draw a graph with $\theta $ on horizontal and $ L_x (\theta)\pi (\theta) $ on the vertical. Now what happens when you plot $\pi (\theta|x) $ on the vertical instead? The scale of the vertical axis changes - and nothing else. — probabilityislogic, Feb 05 '14 at 10:04

score 4 · Accepted Answer · answered Feb 05 '14 at 05:34

Theoretically, why do we not need to compute a marginal distribution constant for finding a Bayesian posterior?

Generally speaking, you do need to - it's just that sometimes it's so easy that you might not notice you did it.

With 'textbook' problems you can often take $\pi(x|\theta) \propto L_x(\theta)\pi(\theta)$, then play about with the result and recognize the density function, at which point you've computed what the normalizing constant must have been - the thing required to scale your $L_x(\theta)\pi(\theta)$ so it integrates to 1. Since it's a pdf you know it integrates to 1, and since it's proportional to $ L_x(\theta)\pi(\theta)$, you know you have divided by the integral of that.

With cases where that doesn't work there are often a few choices.

One is numerical integration - you can integrate $ L_x(\theta)\pi(\theta)$ to work out the normalizing constant. So then you can compute expectations, and so on.

Another is sampling; maybe you can't find the integral but you can bound it and use rejection smapling, or approximate it and use Metropolis-Hastings etc. With a sample from the posterior, you can again find means or other quantities as needed, or get a good approximation to the density or the cdf.

There are other approaches.

My main concern is that there appears to be no rigorous proof I have seen as to why if the likelihood times the prior has "looks" like a certain distribution, we can just add on the constants so that it "is" the distribution. The only "proof" I have seen just comes from textbooks telling you that since its proportional, we can just add on constants. Any thoughts to how I can understand it at a deeper level or a more rigorous manner? Meaning, is there a case where, for example, say the product of the likelihood and prior HAS a certain form, but ISNT even as we try to add constants? Thanks! — user123276, Feb 05 '14 at 05:40
It follows immediately from the fact that integration is linear: $\int af(x) dx = a \int f(x) dx$. I assume you're already aware of that. — Glen_b, Feb 05 '14 at 10:56

score 4 · Answer 2 · answered Feb 05 '14 at 06:52

Your problem is equivalent to the following: Suppose you have a function $f(x)$ such that $\int f(x) dx < \infty$, and you are looking for a constant $c$ such that $\int cf(x) dx = 1$. Clearly, $c = 1/ \int f(x) dx$ would work, but perhaps it isn't easy to compute $\int f(x) dx$. In your quest to find $c$, perhaps you find a probability density function $g(x)$ (integrates to $1$) that shares the same "form" as $f(x)$. That is $g(x) = d f(x)$ for some constant $d$. Then $1 = \int g(x) dx = \int d f(x) dx$ and hence $c = d$. In other words, multiplying $f(x)$ by $c$ gives us the density function $g(x)$. This is the same logic that allows us to find the posterior distribution by recognizing the functional form of the density.

Bayo · Answer 3 · 2014-02-05T11:08:49.443

The "formal proof" you are looking for is called Bayes' Theorem (see also Posterior probability) which states that:

$$\pi(\theta\vert{\bf x}) = \dfrac{f({\bf x}\vert\theta)\pi(\theta)}{\pi({\bf x})}.$$

The left-hand side represents the posterior distribution and IT IS a distribution as long as the prior is proper. From this expression you can identify $f({\bf x}\vert\theta)$ as the likelihood function $L(\theta;{\bf x})$ and you can also see that $\pi({\bf x})$ does not depend upon $\theta$. Therefore

$$\pi(\theta\vert{\bf x}) \propto L(\theta;{\bf x})\pi(\theta).$$

Also, note that it should be $\theta\vert{\bf x}$ and not the other way round. $f({\bf x}\vert \theta)$ is the likelihood function, which is not a distribution as a function of $\theta$, tipically.

Discussion: In Bayesian statistics it is impossible, for non trivial examples, to identify what sort of distribution is this (e.g. normal, student-t ...). Then, the use of MCMC methods is often necessary to sample from the posterior and to conduct a Bayesian data analysis. MCMC methods require the evaluation of the posterior up to a proportionality constant. For this reason, it is not necessary to calculate $\pi({\bf x})$. However, for Bayesian model comparison you need to obtain a numerical approximation of this quantity, given that the Bayes factors are defined in terms of the normalising constant.

score 0 · Answer 4 · edited Sep 23 '16 at 22:56

0

$$π(θ|x)=[f(x|θ)π(θ)]/π(x)$$

In simple terms, the denominator, or the marginal distribution of the RHS of your Bayes theorem is just a constant that is used to make the RHS numerator a pdf. If you know what kind of distribution your RHS numerator, i.e, the Likelihood function * prior distribution follows, then you can find out the denominator(marginal) easily. For example, if your prior is uniform and your likelihood function is a Binomial, then your posterior will be proportional to a Beta distribution. You can now easily find out the constants for a Beta Distribution.

edited Sep 23 '16 at 22:56

Silverfish

23,353
27
103
201

answered Sep 23 '16 at 21:38

Kaushik

11

You can add Latex typesetting to your math using the dollar sign, for instance $x_{abc}^{23}$ produces $x_{abc}^{23}$; doubling up the dollar signs also centers the equation. – Silverfish Sep 23 '16 at 22:57

Theoretically, why do we not need to compute a marginal distribution constant for finding a Bayesian posterior?

4 Answers4

Linked