derivation of predictive distribution of Gaussian Process

Question

There is a duplicate, and the reason why I still ask this question is that, the answer to that duplicate doesn't answer the question well.

The Gaussian Process prior is $$u\sim GP(0,k(x,x'))$$ I tend to write it this way, $$p(u)=N(0,K)$$

The observed training data set is $S=\{(x_1,y_1),...,(x_n,y_n)\}$, and $$y=u(x)+\epsilon$$ and $p(\epsilon)=N(0,\sigma^2I)$. So the likelihood of $u$ given observed $S$ is, $$p(y|x,u)=N(u,\sigma^2I)$$ now let derive the posterior of $u$,

\begin{align}p(u|x,y)&=\frac{p(y|x,u)p(u)}{p(y|x)}\\ &=N(K(\sigma^2I+K)^{-1}y, \sigma^2(\sigma^2I+K)^{-1}K)\end{align}

and this posterior agrees with the equation $(5)$ in that duplicate post.

Now, here comes my problem, I try to derive the predictive distribution. Let $(x^*,u^*)$ denote the unseen data, and since we assume the observed data and the unseen data have a joint Gaussian Process prior, that is,

$$p\pmatrix{u\\u^*}=N\big(0,\pmatrix{K_x &K_{xx^*}\\K_{x^*x} &K_{x^*}}\big)$$

so I could compute the $p(u^*|u)$ by conditioning on $u$. And finally the predictive distribution is

$$p(u^*|S)=\int p(u^*|u)p(u|S)du$$

I have to say I couldn't compute this integral, but the result given out by the duplicate post is

enter image description here ,

Question

1) I don't know how this result is computed, could you please help me to get it straight?

2) I observe that, this result is actually the conditional distribution from the joint distribution of $(y,u^*)$, that is

$$p\pmatrix{y\\u^*}=N(0,\pmatrix{K_x+\sigma^2I &K_{xx^*}\\K_{x^*x} &K_{x^*}})$$

by conditioning on $y$, I could get the same result $p(u^*|y)$ as the above one. Is it a coincidence?

Although normally we ask that one demand improved answers from a thread with unsatisfactory answers, rather than posting a duplicate question, this one presents such a specific issue that it seems to deserve staying open. — whuber, Mar 13 '17 at 22:16
I have been working through this same question. I sadly don't understand it fully yet, but this technical derivation of it has put me on what I think is the correct path. GPR.pdf I know it's just a link; if I could explain it I would, but I'm still trying to understand it. I thought I'd share the document with you anyway; you might be able to get more out of it than I could. — csmaster23, Jul 13 '21 at 20:19
The link to GPR.pdf is updated as https://www.csie.ntu.edu.tw/~cjlin/mlgroup/tutorials/gpr.pdf — jzin, Dec 24 '21 at 03:56
Hope this note could address your concerns: https://whispering-blossom-357.notion.site/Derivation-of-Marginal-Likelihood-and-Posterior-Predictive-of-GP-ee18ee45366f4f06b49f2c18f37bf15a — jzin, Aug 16 '22 at 02:19

score 4 · Answer 1 · answered Mar 13 '17 at 19:59

4

See for example Murphyin page 110-111 Chapter 4.3 Inference in jointly Gaussian distributions. What you are looking for in Theorem 4.3.1, which uses the Matrix Inversion Lemma to compute the posterior conditional probabilities. Replacing $x_1$ with $u_{*}$ and $x_2$ with $u$, $\Sigma_{11}$ with $K_{x^*}$, $\Sigma_{22}$ with $K_{x}$, etc. you get the desired result. (Since we condition w.r.t. $u$-$x_2$) Again the proof is straightforward if you use the matrix inversion Lemma.

answered Mar 13 '17 at 19:59

Pantelis

129

5

Welcome to the community, Panteli. I think your answer will greatly benefit from expanding on it. While most people might have heard of the Matrix Inversion Lemma, they might need help to apply it in this situation; showing how it applies here would be helpful. Moreover, it is not very good idea to point people to closed-source resources (like this otherwise excellent book) because they might not have access to it. – usεr11852 Mar 13 '17 at 20:36

derivation of predictive distribution of Gaussian Process

1 Answers1