8

I have this confusion related to the predictive distribution of gaussian process. I was reading this paper

enter image description here

I didn't get how the integration gave that result. What is P(u*|x*,u). Also how come the covariance of the posterior distribution is $\sigma^2(\sigma^2I+K)^{-1}K$

user34790
  • 6,757
  • 10
  • 46
  • 69
  • +1, I have pretty much the same problem. After searching the web, I found something more confusing. See this lecture notes by Rasmussen, http://videolectures.net/site/normal_dl/tag=12546/epsrcws08_rasmussen_lgp_01.pdf. Pay attention to Page 15. – avocado Jan 31 '14 at 13:49

2 Answers2

5

$P(u*|x*,u) ~ N(u(x*)$, $\sigma^2$), directly from the definition of $u*$.

Notice that integration of two Gaussian pdf is normalized. It can be shown from the fact that $$ \int_{-\infty}^{\infty}P(u^*|x^*, u)du^* =\int_{-\infty}^{\infty}\int_{u}P(u^*|x^*, u)P(u|s)dudu^* =\int_{u}P(u|s)\int_{-\infty}^{\infty}P(u^*|x^*, u)du^*du =\int_{u}P(u|s)\int_{-\infty}^{\infty}N(u^*-u(x*); 0, \sigma^2)du^*du =\int_{u}P(u|s)du\int_{-\infty}^{\infty}N(u^*; 0, \sigma^2)du^* =1 $$

With normalization out of the way,

$\int_{u}P(u^*|x^*, u)P(u|s)du$ is integrated by the following tips:

  1. Substitute the 2 normal pdf into the equation and eliminate the terms independent of $u$, as we have already shown normalization.

  2. Using the completing the square trick for integrating multivariate exponential, i.e., construct a multivariate normal pdf with the remaining exponential terms. Refer to this youTube video.

  3. Eventually you are left with an exponential in terms of $u^*$, it can be observed that this is again a factor away from a normal pdf. Again, the proof of normalization gives us confidence that the final form is indeed a normal pdf. The pdf is the same as the one given in the original post.

2

The detailed derivations of the equations for the conditional distribution of a Gaussian process can be found in chapter 2 and appendix A of the book [Rasmussen2005].

Take a look at (Eq. 2.23, 2.24) and above, which are based on the Gaussian identities (A.6) and the matrix property (A.11).


[Rasmussen2005] C. E. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, 2005.

Emile
  • 3,460
  • I have the same problem as the OP, and I have to say, I didn't found out the detailed derivations in GPML book. And I was confused further after I read the lecture notes as I posted in the above comment. In that notes, the posterior $p(u|S)$ given out by Rasmussen is different from the one in OP's equation $(5)$. I did the derivation myself, and I agree on the posterior $p(u|S)$ being the same as equation $(5)$, I even think Rasmussen's lecture notes might be wrong at this point. If I miss something or make some mistake, please correct me. And I hope that you could elaborate on the derivation. – avocado Jan 31 '14 at 13:55
  • This doesn't answer to the questions. – null Jan 22 '19 at 12:05
  • @avocado I realize this is many years late, but in case this can still help you (or anyone else coming along), please note that $K - K(K + \sigma^2 I)^{-1}K$ is precisely equal to $\sigma^2 (K + \sigma^2 I)^{-1} K$, as well as $\sigma^2 I - \sigma^2 I (K + \sigma^2 I)^{-1} \sigma^2 I$. So, the posterior is the same as OP's equation (5) and as that given in Rasmussen's lecture notes, they're just expressed differently. – duckmayr Feb 13 '20 at 23:31