2

I am watching this lecture on VAE:

https://www.youtube.com/watch?v=uaaqyVS9-rM&t=1507s

and at 26:00, it is stated that the goal is the minimize the KL div. between the distribution we are trying to find - p(z|x) by approximating it with another distribution: q(z). but why aren't we using q(z|x)?

Sycorax
  • 90,934
Moran Reznik
  • 253
  • 2
  • 7
  • 3
  • It is actually $q_{\phi}(\textrm{z}|\textrm{x})$ for math form consistency. However, being conditional on data $x$ or not, it is just a subjective idea and about the capacity of network. Note that it is possible to perform variational inference on both the parameters $\theta$ and the latent variables $z$, as opposed to just the latent variables as we did in general, where we ignore the hyper-parameter, i.e., the (conjugate) prior. Check the original paper VAE at Appendix F Full VB. – Kuo Mar 04 '24 at 13:52
  • Here is an interesting blog on $\beta$-VAE, which use $D_{KL}!\left(q_{\phi}(z)\middle|{p(z)}\right)$ as a part of objective. – Kuo Mar 04 '24 at 15:18

1 Answers1

1

The evidence lower bound we're trying to maximize is valid for any choice of $q(z)$, so for example, you could just arbitrarily choose $q(z) = \mathcal{N}(0,1)$, but this wouldn't yield a very good bound.

Since we can use any $q(z)$ for any given $x$, we might as well have our $q(z)$ depend on $x$, and this is written as $q(z|x)$ -- admittedly maybe some notation like $q(z) = r(z;x)$ would be less confusing.

TLDR: $q(z|x)$ is just a special case of all the possible $q(z)$'s that we could choose from.

shimao
  • 26,092