2

I am currently reading the Lawrence 2005 paper "Probabilistic non-linear Principal Component Analysis with Gaussian Process Latent Variable Models" available here. However I am failing to see how the integral in Equation 2 is evaluated in closed form.

The integral in question is a marginalisation over latent variables $\mathbf{x}$

$p(\mathbf{y}_{n} | \mathbf{W}, \beta) = \int p(\mathbf{y}_{n} | \mathbf{x}_{n}, \mathbf{W}, \beta) p(\mathbf{x}_{n}) d\mathbf{x}_{n}$

where the prior over $\mathbf{x}_{i}$ is given by

$p(\mathbf{x}_{n}) = \mathcal{N}(\mathbf{x}_{n} | \mathbf{0}, \mathbf{I})$

The author of the paper provides the closed form solution to this integration as the following

$p(\mathbf{y}_{n} | \mathbf{W}, \beta) = \mathcal{N}(\mathbf{y}_{n} | \mathbf{0}, \mathbf{W}\mathbf{W}^{T} + \beta^{-1} \mathbf{I})$

However it is not clear to me how this result is derived, any insight would be valued.

In general how does one tackle integrals of this form? i.e. computing marginals of continuous distributions? It seems that a lot of arcane matrix and distribution identities are used.

I know that in the case of a Gaussian Liklihood, a variable $\mathbf{x}_{n}$ may be marginalised by simply dropping it from the mean vector and covariance matrix. But what about the case of a distribution multiplied by one of it's conjugate priors?

Jack H
  • 129

1 Answers1

12

We want to solve $$ \int \mathcal N(y | Wx, \beta^{-1} I) \mathcal N(x | 0, I) dx $$ $$ = \frac{\beta^{D/2}}{(2\pi)^{D/2}} \cdot \frac{1}{(2\pi)^{q/2}}\int \exp \left(-\frac \beta 2 || y - Wx ||^2 - \frac 12 x^T x\right) dx $$ $$ \propto e^{-\frac \beta 2 y^T y}\int \exp \left(-\frac 12 \left[x^T(\beta W^T W + I)x - 2 \beta y^T W x\right]\right) dx. $$ Let $C = \beta W^T W + I$ and $u^T = \beta y^T W$, and note that $C$ is positive definite. Then by completing the square we have that this last result is equal to $$ e^{-\frac \beta 2 y^T y}\int \exp \left(-\frac 12 \left[ (x - C^{-1}u)^T C (x - C^{-1}u) - u^T C^{-1} u \right]\right) dx $$ $$ \propto e^{-\frac 12 \left(\beta y^T y - u^T C^{-1} u\right)} = \exp \left(-\frac 12 y^T\left(\beta I - \beta^2 WC^{-1}W^T \right)y\right) $$ so this tells us that $\Sigma^{-1} = \beta I - \beta^2 WC^{-1}W^T$ so now we need to decide if $$ \Sigma = \left(\beta I - \beta^2 W \left(\beta W^T W + I\right)^{-1}W^T\right)^{-1} \stackrel ?= WW^T + \beta^{-1} I. $$

Recall the Woodbury matrix identity: $$ A^{-1} - A^{-1} U (B^{-1} + VA^{-1}U)^{-1}V A^{-1} = (A + UBV)^{-1}. $$

Take $A = \frac 1\beta I$, $B = I$, $U = W$, and $V = W^T$. Then $$ A^{-1} - A^{-1} U (B^{-1} + VA^{-1}U)^{-1}V A^{-1} = \beta I - \beta^2 W (I + \beta W^T W)^{-1}W^T = \Sigma^{-1} = \left(\frac 1 \beta I + W W^T\right)^{-1} $$ $$ \implies \Sigma = \frac 1 \beta I + W W^T $$ as desired.

This means that we have found that the kernel of the distribution is $$ \exp \left(-\frac 12 y^T\left(WW^T + \beta^{-1} I\right)^{-1}y\right) $$ which corresponds to $\mathcal N(y | 0, WW^T + \beta^{-1} I)$.


As a general strategy, integrals involving multiple normal pdfs usually require completing the square so that you can factor out some things and are left with a normal density that integrates to 1. While the Woodbury identity looks really ugly at first, it is so useful that very quickly you'll recognize it so this is just practice.

jld
  • 20,228
  • Thank you for such a wonderfully thorough answer. This is a huge help. I shall try to re-derive this again this evening. I din't even know about the Woodbury identity. Thank you. – Jack H Aug 01 '17 at 16:36
  • @VisionIncision glad to hear this was helpful! – jld Aug 02 '17 at 13:19