6

Yesterday I was given a data set $(a_1,\ldots,a_n)$ (i.e., $n$ i.i.d. realizations) and computed a desired empirical conditional probability $P(A_n|B_n)$ where $A_n,B_n$ are events in the data.

Today, I received a new data point and my total data set is now $(a_1,\ldots,a_n,a_{n+1})$. I again want to compute the "updated" $P(A_{n+1}|B_{n+1})$ given this new data.

My question is, how would "Bayesian updating" be used here? I could just compute $P(A_{n+1}|B_{n+1})$ using the definition, but I'm interested in learning how to use this Bayesian updating technique. My best guess is $$ P(A_{n+1}|B_{n+1}) = \frac{P(B_{n+1}|A_{n+1})P(A_n|B_n)}{P(B_{n+1})}, $$ i.e., using my previous posterior as my new prior, but this statement is not mathematically true. In particular, $P(A_{n+1}|B_{n+1}) \notin [0,1]$ necessarily. So, what is meant by "Bayesian updating," and why should I use it over just computing conditional probabilities using the definition?

bcf
  • 617
  • Why can you not just calculate the probability the same way you did $P(A_n \mid B_n)$? – dsaxton Jul 01 '15 at 20:27
  • @dsaxton Of course I could, and that's what I would usually do. I just wanted to see how "Bayesian updating" might be used, in particular, what it would mean to use it in this example. – bcf Jul 01 '15 at 20:43

1 Answers1

2

This isn't a typical Bayesian update setup - what is the sequence $B_i$? Usually these are the observed variables, while $A_i$ is a sequence of latent variables, the ones we wish to estimate. In that case,

we predict, based on the $B_1,...,B_{n-1}$, using

$$ P(A_n|\{B_1,...,B_{n-1}\}) = \int P(A_n|A_{n-1})P(A_{n-1}|\{B_1,...,B_{n-1}\})dA_{n-1}, $$

then update our bad prediction when $B_n$ arrives by

$$ P(A_n|\{B_1,...,B_n\}) = \frac{P(B_n|A_n)P(A_n|\{B_1,...,B_{n-1}\})}{P(B_n|B_{n-1})}, $$

So the prior you speak of here is $P(A_n|\{B_1,...,B_{n-1}\})$, the previous estimate of the "posterior" (it is not strictly a posterior) which is used from the update step. This follows the general principle in Bayesian forecasting - the current estimate of the prior contains everything we know about that density. It should be used in the next step.

Sorry for using integrals instead of summations - that's how I wrote it up.