In VAE, why are we approximating p(z|x) using q(z) and not q(z|x)

Question

I am watching this lecture on VAE:

https://www.youtube.com/watch?v=uaaqyVS9-rM&t=1507s

and at 26:00, it is stated that the goal is the minimize the KL div. between the distribution we are trying to find - p(z|x) by approximating it with another distribution: q(z). but why aren't we using q(z|x)?

Does this answer your question? Why is $q(\mathbf{z})$ chosen to be the posterior distribution in the EM algorithm? — mhdadk, Aug 14 '21 at 12:25
It is actually $q_{\phi}(\textrm{z}|\textrm{x})$ for math form consistency. However, being conditional on data $x$ or not, it is just a subjective idea and about the capacity of network. Note that it is possible to perform variational inference on both the parameters $\theta$ and the latent variables $z$, as opposed to just the latent variables as we did in general, where we ignore the hyper-parameter, i.e., the (conjugate) prior. Check the original paper VAE at Appendix F Full VB. — Kuo, Mar 04 '24 at 13:52
Here is an interesting blog on $\beta$-VAE, which use $D_{KL}!\left(q_{\phi}(z)\middle|{p(z)}\right)$ as a part of objective. — Kuo, Mar 04 '24 at 15:18

score 1 · Answer 1 · answered Aug 14 '21 at 14:56

The evidence lower bound we're trying to maximize is valid for any choice of $q(z)$, so for example, you could just arbitrarily choose $q(z) = \mathcal{N}(0,1)$, but this wouldn't yield a very good bound.

Since we can use any $q(z)$ for any given $x$, we might as well have our $q(z)$ depend on $x$, and this is written as $q(z|x)$ -- admittedly maybe some notation like $q(z) = r(z;x)$ would be less confusing.

TLDR: $q(z|x)$ is just a special case of all the possible $q(z)$'s that we could choose from.

In VAE, why are we approximating p(z|x) using q(z) and not q(z|x)

1 Answers1