There is a duplicate, and the reason why I still ask this question is that, the answer to that duplicate doesn't answer the question well.
The Gaussian Process prior is $$u\sim GP(0,k(x,x'))$$ I tend to write it this way, $$p(u)=N(0,K)$$
The observed training data set is $S=\{(x_1,y_1),...,(x_n,y_n)\}$, and $$y=u(x)+\epsilon$$ and $p(\epsilon)=N(0,\sigma^2I)$. So the likelihood of $u$ given observed $S$ is, $$p(y|x,u)=N(u,\sigma^2I)$$ now let derive the posterior of $u$,
\begin{align}p(u|x,y)&=\frac{p(y|x,u)p(u)}{p(y|x)}\\ &=N(K(\sigma^2I+K)^{-1}y, \sigma^2(\sigma^2I+K)^{-1}K)\end{align}
and this posterior agrees with the equation $(5)$ in that duplicate post.
Now, here comes my problem, I try to derive the predictive distribution. Let $(x^*,u^*)$ denote the unseen data, and since we assume the observed data and the unseen data have a joint Gaussian Process prior, that is,
$$p\pmatrix{u\\u^*}=N\big(0,\pmatrix{K_x &K_{xx^*}\\K_{x^*x} &K_{x^*}}\big)$$
so I could compute the $p(u^*|u)$ by conditioning on $u$. And finally the predictive distribution is
$$p(u^*|S)=\int p(u^*|u)p(u|S)du$$
I have to say I couldn't compute this integral, but the result given out by the duplicate post is
,
Question
1) I don't know how this result is computed, could you please help me to get it straight?
2) I observe that, this result is actually the conditional distribution from the joint distribution of $(y,u^*)$, that is
$$p\pmatrix{y\\u^*}=N(0,\pmatrix{K_x+\sigma^2I &K_{xx^*}\\K_{x^*x} &K_{x^*}})$$
by conditioning on $y$, I could get the same result $p(u^*|y)$ as the above one. Is it a coincidence?