Predictive Distribution in Gaussian Process for Machine Learning

Question

I am reading Gaussian Process for Machine Learning equation 2.9, where it is deriving the predictive distribution

$$p(f_* | \mathbf{x}_*, X, \mathbf{y}) = \int p(f_* | \mathbf{x}_*, \mathbf{w}) p(\mathbf{w} | X, \mathbf{y}) d \mathbf{w}.$$

I tried to do it analytically like in equation 2.8 but the terms are all over the place. I also tried to treat it as a delta function as in here. But I still cannot understand it.

$$ \begin{align*} p(f_* | \mathbf{x}_*, X, \mathbf{y}) &= \int p(f_* | \mathbf{x}_*, \mathbf{w}) p(\mathbf{w} | X, \mathbf{y}) d \mathbf{w} \\[5pt] &= \int \delta(f_* - \mathbf{x}_*^T \mathbf{w}) p(\mathbf{w} | X, \mathbf{y}) d \mathbf{w}. \end{align*} $$

Properites of delta functions includes

$$ \begin{align*} \int \delta(x - x_0) f(x) dx = f(x_0) \end{align*} $$

which I cannot see how to use it here. I also read the matrix variant of the delta function from the matrix cookbook equation 548

$$ \int \delta(\mathbf{x} - A\mathbf{s}) p(\mathbf{s}) d\mathbf{s} = \frac{1}{\sqrt{\det(A^TA)}} p(A^+\mathbf{x}), $$

where $A^+$ is the psudo inverse of $A$. Then

$$ \begin{align*} p(f_* | \mathbf{x}_*, X, \mathbf{y}) &= \frac{1}{\sqrt{\det(\mathbf{x}_* \mathbf{x}_*^T)}} p_{\mathbf{w}}((\mathbf{x}_* \mathbf{x}_*^T)^{-1} \mathbf{x}_* f_* | X, \mathbf{y}) \end{align*} $$

and I am stuck here. Sorry for creating one more post because I cannot make comments right now.

As I have done here, $p(f_|\mathbf{x_, w})=\mathcal{N}(\mathbf{w^Tx^*},\sigma_f^2)$ does give the result (take the limit $\sigma_f\to 0$ - making it delta function). But this does not seem to be the most general method. — muser, Aug 23 '23 at 16:57
most general in the sense of directly using delta function properties as you've done — muser, Aug 23 '23 at 16:58

score 0 · Answer 1 · answered Aug 23 '23 at 14:05

0

Without having looked into the specific context of the application to Gaussian processes, I think you may be overcomplicating things - I read it as just the product rule of probabilities with conditioning, followed by marginalisation so:

$$\begin{align}p(f_* | \mathbf{x}_*, X, \mathbf{y}) &= \int p(f_* | \mathbf{x}_*, \mathbf{w}) p(\mathbf{w} | X, \mathbf{y}) d \mathbf{w} \\ &= \int p(f_*, \mathbf{w} | \mathbf{x}_*, X, \mathbf{y}) d \mathbf{w} \\ \end{align}$$

answered Aug 23 '23 at 14:05

microhaus

2,505

1

Yes. It is true. But we do not have the distribution of $f_, \mathbf{w} | \mathbf{x}_, X, \mathbf{y}$. We only $f_* | \mathbf{x}*, \mathbf{w} \sim N(\mathbf{x}^T \mathbf{w}, \sigma_n^2)$, and $\mathbf{w} | X, \mathbf{y} \sim N(\mathbf{\bar{w}}, A^{-1})$, where $\mathbf{\bar{w}}$ and $A$ are some functions of $X$ and $\mathbf{y}$. We are trying to show that $f_ | \mathbf{x}*, X, \mathbf{y} \sim N(\mathbf{x}^T \mathbf{\bar{w}}, \mathbf{x}_^T A^{-1} \mathbf{x}*)$. But I am confused by the different dimensions of $f*$ and $\mathbf{w}$. – s20012303 Aug 23 '23 at 15:55

Predictive Distribution in Gaussian Process for Machine Learning

1 Answers1