working on a personal project on latent variable models, and trying to understand a mathematical step: specifically, if we marginalize out the weights in PPCA, what is the marginal distribution of the observations?
Consider a probabilistic PCA (PPCA) style latent variable model, e.g., that we have observations $\{x_i\}_{i \in [1, N]}$ where $x_i \in \mathbb{R}^{D}$ that we believe can be described as latent variables $\{z_i\}_{i \in [1, N]}$ where $z_i \in \mathbb{R}^{K}$ such that $K \lt \lt D$. Specifically, we believe that $x_i$ is a linear function of the $z_i$, such that
$$P(x_i | z_i, W, \beta) = \mathcal{MVN}(x_i; W^Tz_i, \beta^{-1}I)$$
where $W \in \mathbb{R}^{K x D}$
Next, we propose the following priors on the column vectors $z_i$ and $W_i$:
$$P(z_i) = \mathcal{MVN}(0, I)$$ $$P(W | \alpha) = \prod_{i = 1}^D P(W_i | \alpha) = \prod_{i=1}^D \mathcal{MVN}(0, \alpha^{-1} I)$$
My question is: what should the marginal distribution
$$P(X | Z, \beta, \alpha) = \int P(X | Z, W, \beta, \alpha) P(W | \alpha)$$
be?
According to the following paper by Neil Lawrence, the marginal distribution of $X$ is
$$P(X | Z, \beta, \alpha) = \mathcal{MVN}(0, K)$$
where $K = \alpha ZZ^T + \beta^{-1} I$. However, I can not understand why the precision $\alpha$ doesn't appear as $\alpha^{-1}$ computation of $K$, e.g., why $K = \alpha^{-1} ZZ^T + \beta^{-1} I$ is wrong. My logic for this was as follows: let
$$K = Var(X | Z, \beta, \alpha) = Var(W^T Z) + Var(\epsilon)$$
where $\epsilon \sim \mathcal{MVN}(0, \beta^{-1} I)$; therefore
$$K = Var(X | Z, \beta, \alpha) = ZVar(W)Z^T + \beta^{-1} I = Z(\alpha^{-1})Z^T + \beta^{-1}I = \alpha^{-1}ZZ^T + \beta^{-1}I$$
Could someone tell me where I went wrong, and why I can't seem to reproduce Neil Lawrence's mathematical result?