I'm reading through the Gaussian Process book http://www.gaussianprocess.org/gpml/chapters/RW2.pdf and there's one section here I don't quite understand (page 11). The author says:
"the predictive distribution is given by averaging the output of all possible linear models wrt the Gaussian posterior"
$$ \begin{aligned} p(f_*|x_*,X,y) &= \int p(f_*|x_*,w)~p(w|X,y)~dw \\ &=\mathcal N\left(\frac{1}{\sigma_n^2}x^T_*A^{-1}Xy,~x^T_*A^{-1}x_*\right) \end{aligned} $$
What does this mean, exactly? I understand that the purpose of using Gaussians is to be able to calculate uncertainty for a prediction, but I'm unclear how the "averaging of the output" doesn't end up with just a mean value of the weight. And how were the parameters for the mean and covariance derived?