2

I wonder about the advantage of a gp over "standard" regression tools like a loess. A gp fits to the data as best as possible, like every model tries to do this. More or less, let's keep it that simple. Now, all I see is that a gp adds a confidence intervall around it's regression curve which becomes broader the lass data is available. So where is the benefit of a gp?

A similar question has been answered, but it's over my head, at least at the moment. Can someone please explain in simple words?

gp

Ben
  • 3,443

1 Answers1

3

Some disadvantages of LOWESS vs GP are the following:

  • LOWESS will fail even for a moderate input dimension of data, while GP works in this case for right kernel selected
  • For LOWESS to work we need a dense training sample of points over the whole design space. For GP the requirements are less strict
  • We need additional tricks to control outliers

Some advantages of GP vs LOWESS are the following:

  • Both prediction and uncertainty estimate have analytical forms: at each point $x$ we have a posterior distribution given the data $\mathcal{N}(\mu(x), \sigma^2(x))$ with $\sigma^2(x)$ has analytical form and is natural uncertainty estimate
  • GP controls over the smoothness of the model by the selection of a proper kernel
  • Straightforward estimation of parameters: we write likelihood for the data and maximize it with respect to parameters of the kernel
  • GP can be treated as a special problem for the selection of a function from RKHS space
  • Theoretical justification: it is a solution of Kolmogorov-Wiener equations, so we minimize L2 error over all possible estimators under the assumption about the Gaussian joint data distribution
  • An interpolation without giving up the smoothness of the model, so the predictions $\hat{y}(x)$ equal to the true values $y(x)$.

IMO the most important points are limitation of LOWESS to moderate dimensions and absence of analytical and principled global prediction and uncertainty estimate. Also, I believe, that in practice GP just works better, than most of the kernel methods including LOWESS.