4

I am given a set of training values $X$ and $y$, and the goal is to predict the value $f_* \equiv f(x_*)$ for a new data point $x_*$, where $f(x) = x^Tw$. How can I show that: $$ \begin{align} p(f_* | x_*, X, y) &= \int p(f_* | x_*,w) p(w|X,y) dw\\ &= \mathcal{N}\left(\frac{1}{\sigma_n^2} x^T_*A^{-1}Xy, x_*^TA^{-1}x_*\right) \end{align} $$

where $$ \begin{align} p(w | X, y) &\propto \exp\left( -\frac{1}{2} (w - \bar{w})^T A^{-1} (w-\bar{w}) \right) \\ &= \mathcal{N}\left(\frac{1}{\sigma_n^2} A^{-1} Xy, A^{-1} \right)\\ A &= \left( -\sigma_n^{-2}XX^T - \Sigma_p^{-1} \right)^{-1}\\ \bar{w} &= \sigma_n^{-2} \left( \sigma_n^{-2} XX^T - \Sigma_p^{-1} \right)^{-1} Xy = \sigma_n^{-2} AXy \end{align} $$

I know intuitively why that is the case, and that it is a case of the posterior predictive distribution, but I would like a more detailed and analytical approach to show the exact form. Any pointers would be helpful.

(Note: this comes from Rasmussen's book on Gaussian Processes: http://www.gaussianprocess.org/gpml/chapters/RW2.pdf)

alguru
  • 141

1 Answers1

1

The statement contained within your question is not correct in general. What I believe you are asking is to show that the posterior predictive of a Gaussian process with a degenerate likelihood function is an MVN. Then, trivially:

$$ p(f_*|X,y) = \int \Bbb{1}_{\{f_*-x_*^T w\}} \mathcal{N}(\bar{w}, A^{-1}) dw = x_*^T\mathcal{N}(\bar{w}, A^{-1}) = \mathcal{N}(x_*^T\bar{w}, x_*^T A^{-1} x_*)\ $$

Oxonon
  • 389