5

I am looking into C.Bishop's Pattern Recognition and Machine Learning chapter 10. In section 10.1.1 (p.465) he's proceeding through an analysis of the evidence lower bound which goes on like so:

\begin{align} \mathcal{L}(q) & = \int\prod_{i}q_i\{\ln{p(X, Z)}-\sum_{i}\ln{q_i}\}dZ \\ & = \int q_j\,\{\int{\ln{p(X,Z)}\prod_{i\neq j}}{q_i}\, dZ_i\}\, dZ_j - \int{q_j \ln{q_j} \,dZ_j }\, + const \\ & = \int{q_j\ln{\widetilde{p}(X,Z_j)\,dZ_j}}-\int{q_j \ln{q_j} \,dZ_j }\, + const \end{align}

where $q_j = q_j(Z_j)$

I understand he's working through the equation to finally arrive at the definition of the KL divergence between $\widetilde{p}(X,Z_j)$ and $q_j(Z_j)$, what I cannot wrap my head around is how he got from the first step to the second. Does anyone have any suggestions?

AutomEng
  • 173

1 Answers1

5

I think the confusion may be regarding keeping tracking of what is bundled into the constant term, in particular the emphasis is very much on deriving an expression for a single component $q_j$ and sweeping as much as possible into the constant, so the first step goes \begin{align*} \int \prod q_i \left\{ \ln p(\mathbf{X}, \mathbf{Z} - \sum_j \ln q_j \right\} d\mathbf{Z} &= \int \prod q_i \ln p(\mathbf{X},\mathbf{Z})d\mathbf{Z} - \int \prod q_i \sum_i \ln q_i d\mathbf{Z} \\ &= \int q_j \left(\prod_{i \neq j } q_i \right) \ln p(\mathbf{X},\mathbf{Z})\,d\mathbf{Z} - \sum_j \int q_j \left( \prod_{i \neq j} q_i \right) \ln q_j d\mathbf{Z} \\ &= \int q_j \left[ \int \left(\prod_{i \neq j } q_i \right) \ln p(\mathbf{X},\mathbf{Z}) d\mathbf{Z}_{-j} \right] dZ_j - \sum_j \int q_j \ln q_j \left[ \underbrace{\int \left( \prod_{i \neq j} q_i \right) d\mathbf{Z}_{-j}}_{=1} \right] dZ_j \\ &= \int q_j \left[ \int \left(\prod_{i \neq j } q_i \right) \ln p(\mathbf{X},\mathbf{Z}) d\mathbf{Z}_{-j} \right] dZ_j - \int q_j \ln q_j dZ_j + C_1 \end{align*} where the first constant, $C_1$, is the entropy of all those terms involving $i \neq j$. The next step is to consider $$ \int \left(\prod_{i \neq j } q_i \right) \ln p(\mathbf{X},\mathbf{Z}) d\mathbf{Z}_{-j} $$ which after integrating out $\mathbf{Z}_{-j}$ will be a function of $\mathbf{X}$ and $Z_j$ only, he therefore defines the new distribution $$ \tilde{p}(\mathbf{X},Z_j) \propto \exp\left\{ \int \prod_{i \neq j} q_i \ln p(\mathbf{X}, \mathbf{Z} ) d\mathbf{Z}_{-j} \right\} $$ or $$ \ln \tilde{p}(\mathbf{X},Z_j) = \int \left(\prod_{i \neq j } q_i \right) \ln p(\mathbf{X},\mathbf{Z}) d\mathbf{Z}_{-j} + C_2 $$ where $C_2$ is a normalising constant. Rearranging, putting back into the expression above and then combining the constants gives you the quoted result.

Nadiels
  • 226
  • Thanks for this: although this was not complicated, I also failed to see how the entropy terms were getting into $C_{1}$ – Julien Dec 08 '19 at 17:34