6

In variational methods, given a set of latent variables $z$ corresponding to visible variables $x$, why is it that the probability distribution $p\left(x\middle|z\right)$ is tractable to compute, but $p\left(z\middle|x\right)$ intractable?

My current understanding is that$$ p\left(x\middle|z\right) = \frac{p\left(z\middle|x\right) p\left(x\right)}{p\left(z\right)} $$and$$ p\left(x\right) = \int {p\left(x\middle|z\right) p\left(z\right) \mathrm{d}z} $$which is often impossible because there are too many $z$ to marginalize them all out. So $p\left(x\middle|z\right)$ is intractable because $p\left(x\right)$ is intractable.

But isn't$$ p\left(z\middle|x\right) = \frac {p\left(x\middle|z\right) p\left(z\right)}{p\left(x\right)} $$then also impossible to compute? Since it depends on both $p\left(x\right)$ and $p\left(x\middle|z\right)$?

Nat
  • 805

1 Answers1

4

In bayesian inference, when you have some data $x$, you first specify a ${likelihood}$, $p(x|z)$, also called a sampling distribution, which will depend on some unknown parameters $z$ (also called latent variables, going with your notation). We then have to specify a $prior$ on these latent variables, $p(z)$, to completely specify the $data\ generating\ process$. This is called the data generating process as we can imagine first sampling some latent variables from the prior $z^*\sim p(z)$, and then sampling a data point from the likelihood at this sample $z^*$, $x^*\sim p(x|z^*)$. The reason the likelihood is tractable is ${because\ we\ say\ it\ is}$. This isn't specific to bayesian inference either. In frequentist inference you also specify a likelihood (you just don't specify a prior). At some point you need to assume some model for your data so you can actually infer something! In the case of bayesian inference this model is the combination of likelihood and prior.

aleshing
  • 1,588