0

I am trying to understand VAEs. A youtube video and a paper that I read about it defined the loss as roughly: $$L=\sum||x-Dec(Enc(x))||_2^2 + D_{KL}(\mathcal N(\mu, \sigma)|\mathcal N(0, 1))$$

The terms $\mu$ and $\sigma$ are learned by let's say $2$ MLPs, and we consider that the latent $z$ is distributed acording to $\mathcal N (\mu, \sigma)$.

The first term is just the reconstruction loss, which makes sense to me, but I'm not sure if I understand why the second term is needed. What do we gain by making the mean and variation be $0$ and $1$? If we really want the distribution to be a standard gaussian, why don't we just remove the encoder and take $z$ to be $\mathcal N(0, 1)$? Or is the point of that term to just not let the mean and variance explode?

Wolfuryo
  • 133

0 Answers0