2

In a Bayesian model, we normally have that:

$$ p(\boldsymbol\mu|\boldsymbol X) = \dfrac{p(\boldsymbol X|\boldsymbol \mu)p(\boldsymbol \mu)}{p( \boldsymbol X)} $$

Now suppose that $\boldsymbol \mu \sim N(\boldsymbol \mu_0, A)$ and that $\boldsymbol X | \boldsymbol \mu \sim N(\boldsymbol \mu, B)$. In this case, by conjugation properties, the posterior, $p(\boldsymbol\mu|\boldsymbol X) $ is also normal.

Suppose now that I want to find the marginal density of $\boldsymbol X$. Then normally we would integrate $p(\boldsymbol X|\boldsymbol \mu)p(\boldsymbol \mu)$ with respect to $\boldsymbol \mu$.

HOWEVER, another method is to just use:

$$ p( \boldsymbol X) = \dfrac{p(\boldsymbol X|\boldsymbol \mu)p(\boldsymbol \mu)}{p(\boldsymbol\mu|\boldsymbol X)} $$

and to just drop all terms not in $\boldsymbol X$, in essence, to find the kernel of $\boldsymbol X$, which should be an exponential form. After this, we just fill in the constants by way of identification of the kernel.

It appears that here this technique works. However I am wondering if in general this result holds.

My question is: What allows us to know that $p(\boldsymbol X)$ is a valid probability density function just by looking at the kernel? If the likelihood, posterior, and prior are all valid probability distribution functions summing to $1$, is it enough for me to just "fill" in constants by way of lookinga tthe kernel?

user321627
  • 4,474

1 Answers1

4

Bayes theorem is

$$ f_{X\mid Y}(x \mid y) = \frac{ f_{Y\mid X}(y \mid x) \; f_X(x) }{ f_Y(y) } = \frac{ f_{Y\mid X}(y \mid x) \; f_X(x) }{ \int f_{Y\mid X}(y \mid x) \; f_X(x) \;dx } $$

where

$$ f_Y(y) = \int f_{Y\mid X}(y \mid x) \; f_X(x) \;dx $$

by the law of total probability. So it follows from the probability theory.

Tim
  • 138,066
  • Is this dependent on the prior being proper? Also would having the kernel form in the denominator be enough to identify the distribution? – user321627 Nov 03 '16 at 10:24
  • @user321627 it holds for proper probability distributions (pmf, pdf) in other cases it does not have to hold. – Tim Nov 03 '16 at 10:27
  • @user321627 moreover, if you want only to estimate the unconditional distribution of data, then why do you want to use Bayes theorem for it? Why not simply use KDE? – Tim Nov 03 '16 at 11:05
  • Sorry for posting my questin here, but I see you had answered perfectly several questions about Bauesian theorem. This answer is the best match my need. I would like to know from the formula you posted in this answer how the posterior distribution is derived in practice ? Do we apply this formula for every $x$ value of $X$ and every $y$ in the data ? thank you in advance – Nizar Aug 26 '17 at 13:40
  • @Nizar the answer is "yes", but I encourage you to post it as a question and describe in greater detail what you understand and what is unclear. – Tim Aug 26 '17 at 21:15
  • Thank you Tim for answering on a comment,please see my question https://stats.stackexchange.com/questions/299932/bayesian-inference-for-parameter-estimation?noredirect=1#comment570061_299932 I post a new question as you recommend. – Nizar Aug 27 '17 at 04:46