4

Suppose one calculates a posterior distribution on some parameter $P(\theta\mid x)$ based on some prior. Now suppose we take that data and divide it into two parts, such that $x = x_1 \cup x_2$. Intuitively, breaking up the data must lead to the same posterior given the same prior, i.e. $$P(\theta\mid x) = P(\theta\mid x_1, x_2)$$

However it's not obvious from Bayes' Law that this is correct.

What are some ways to formally prove this?

  • 2
    This has nothing to do with Bayes, because it's just a statement about conditional probabilities. It follows immediately from the standard definition, because the sigma-algebras generated by $x$ and $(x_1,x_2)$ are the same. – whuber Jan 14 '24 at 16:35
  • @whuber - Woha. Would you be able to post a brief answer explaining that comment? – Nathaniel Bubis Jan 15 '24 at 04:38
  • 1
    https://stats.stackexchange.com/questions/312474 explains the terminology and https://stats.stackexchange.com/questions/452524 explains what a product sigma algebra is. – whuber Jan 15 '24 at 15:11

2 Answers2

5

I don't think this is a proof, but just from playing around with Bayes' rule here's what I got: $$ P(\theta|x_{1}, x_{2}) = \underbrace{P(x_{1}, x_{2}|\theta)}_{\text{Likelihood}} \, \underbrace{P(\theta)}_{\text{Prior}} \, c^{-1}_{1} $$ where $c_{1} = P(x_{1},x_{2})$ is the normalization constant. But also, $$ P(\theta|x_{1}) = P(x_{1}|\theta) \, P(\theta) \, c^{-1}_{2} \quad \Rightarrow \quad P(\theta) = c_{2} \, \frac{1}{P(x_{1}|\theta)} \, P(\theta|x_{1}) $$ where $P(\theta|x_{1})$ is the posterior of $\theta$ after only updating with $x_{1}$. Now you can plug this into the first equation: $$ P(\theta|x_{1}, x_{2}) = \frac{c_{2}}{c_{1}} \, \frac{P(x_{1},x_{2}|\theta)}{P(x_{1}|\theta)} \, P(\theta|x_{1}) $$ But the second term is just conditioning the joint likelihood: $\frac{P(x_{1},x_{2}|\theta)}{P(x_{1}|\theta)} = P(x_{2}|x_{1},\theta)$. So overall, $$ P(\theta|x_{1}, x_{2}) = c^{-1}_{3} \, P(x_{2}|x_{1},\theta) \, P(\theta|x_1) $$ So the Bayesian updating in the second step (with $x_{2}$) is just Bayes' rule applied to a prior that is already conditioned on $x_{1}$, i.e. the posterior from the first updating.

Durden
  • 1,171
  • The last step can be simplified further if you assume $x_2$ is independent of $x_1$, but you still have a term depending on $x_1$. It's not clear how this dependence fully cancels out with $c_1, c_2, c_3$. – Nathaniel Bubis Jan 14 '24 at 01:57
  • 1
    The constants are determined by the probability adding to 1, so they can't be wrong. That is, if two densities are equal up to a multiplicative constant they must be identical – Thomas Lumley Jan 14 '24 at 03:49
  • @thomasLumley I realize that, but it would be nice to see that this is so directly. – Nathaniel Bubis Jan 14 '24 at 05:29
  • With conditional independence of $X_1,X_2$ given $\Theta=\theta$ and just looking at proportionalities, I would have thought you have $\mathbb P(\Theta=\theta \mid X_1=x_1,X_2=x_2 )$ $\propto \mathbb P(X_1=x_1,X_2=x_2 \mid \Theta=\theta)\mathbb P(\Theta=\theta)$ $\propto \mathbb P(X_2=x_2 \mid \Theta=\theta) \Big(\mathbb P(X_1=x_1 \mid \Theta=\theta) \mathbb P(\Theta=\theta)\Big)$, i.e. updating the prior with the first set of data to give an intermediate posterior and then updating that with the second set of data to give the final posterior. – Henry Jan 14 '24 at 11:20
  • 1
    @NathanielBubis in my notation, $c_{3} = \frac{c_1}{c_{2}} = \frac{P(x_1,x_2)}{P(x_1)} = P(x_{2}|x_{1})$, which makes sense since this is exactly the normalizing constant you would expect when updating prior/posterior $P(\theta|x_1)$ with $x_{2}$. – Durden Jan 14 '24 at 23:15
1

Start with the posterior given $x_1$: $$P(\theta\mid x_{1})=\frac{P(x_{1}\mid\theta)P(\theta)}{\int_{\theta}P(x_{1}\mid\theta)P(\theta)d\theta}$$ Now the posterior given both $x_1$ and $x_2$:

$$P(\theta\mid x_{2},x_{1})=\frac{P(x_{2}\mid\theta,x_{1})P(\theta\mid x_{1})}{\int_{\theta}P(x_{2}\mid\theta,x_{1})P(\theta\mid x_{1})d\theta}=\dfrac{P(x_{2}\mid\theta,x_{1})\frac{P(x_{1}\mid\theta)P(\theta)}{\int_{\theta}P(x_{1}\mid\theta)P(\theta)d\theta}}{\int_{\theta}P(x_{2}\mid\theta,x_{1})\left(\frac{P(x_{1}\mid\theta)P(\theta)}{\int_{\theta}P(x_{1}\mid\theta)P(\theta)d\theta}\right)d\theta}$$

Because the internal integral is no longer a function of $\theta$, we can remove it outside the external integral in the denominator and it cancels out the same factor in the numerator. Moreover, since $x_{1}$ and $x_{2}$ are independent, we have that:

$$\begin{align} P(\theta\mid x_{2},x_{1})&=\dfrac{P(x_{2}\mid\theta,x_{1})P(x_{1}\mid\theta)P(\theta)}{\int_{\theta}P(x_{2}\mid\theta,x_{1})P(x_{1}\mid\theta)P(\theta)d\theta}\\ &=\dfrac{P(x_{2},x_{1}\mid\theta)P(\theta)}{\int_{\theta}P(x_{2},x_{1}\mid\theta)P(\theta)d\theta}=P(\theta\mid x) \quad\blacksquare \end{align}$$