1

I'm trying to derive the ELBO as per CS229 Page 144, Eq 11.8.

For E-step, the LHS and RHS must be equal: $$ \begin{align} f\left(\mathbb{E}_{z\sim Q}\left[ \frac{p(x, z; \theta)}{Q(z)}\right]\right) \geq \mathbb{E}_{z\sim Q}\left[f\left( \frac{p(x, z; \theta)}{Q(z)}\right)\right] \end{align} $$

The value must be a constant random variable for equality to hold: $$ \begin{align} \frac{p(x, z; \theta)}{Q(z)} = c \end{align} $$

This can be accomplished by choosing $Q(z) \propto p(x, z; \theta)$. Also, $\sum_z Q(z) = 1$.

Part that is not clear: $$ \begin{align} Q(z) = \frac{p(x, z; \theta)}{\sum_z p(x, z; \theta)} \end{align} $$ I'm aware of the simplification that follows this equation. But how did they arrive at this? Which property am I missing?

Galen
  • 8,442
  • Maybe I don't properly understand your question: You look for a function that differs from $p$ only by a factor and that is also normalized. So just normalize it, i.e. divide by the sum of all values. You can check that now both conditions are satisfied. – frank Sep 08 '22 at 06:39
  • I get that the denominator is independent of $z$ after marginalization, and the whole fraction reduces to $p(z|x)$. My question is why that exact term in the denominator and nothing else? Most derivations directly assume $Q(z)=p(z|x)$, which is in fact derived from the equation above. – Karan Shah Sep 08 '22 at 07:39
  • My question is a duplicate. Found the solution here – Karan Shah Sep 08 '22 at 07:43

0 Answers0