Well, the regression to the mean of a GP is "well-known" but actually not always true.
Let $m:D\rightarrow\mathbb{R}$ be the (prior) mean function and $K:D\times D\rightarrow\mathbb{R}$ be the covariance function of a GP. After observing data $(x_i)$ your posterior mean function will be:
$$ \hat{m}(x)= m(x) + \sum_i \alpha_i K(x_i,x)$$ for suitable $\alpha_i$. (see Rasmussen/Williams Equation (2.27) ).
Since the $\alpha_i$ are constant, the behaviour far enough from data points is entirely determined by $m$ and $K$. Let us assume for simplicity that $m=0$, i.e. zero-mean GP. Then "regression to the mean" means small values of $K(x_i,x)$ for $x$ far away from the data $x_i$. This is true for many popular kernels in GP regression, most notably for the Gaussian kernel ($exp(-x^2)$). But this is not the only possible choice. Not all kernels have $K(x_i,x)\rightarrow 0$ for large $x$. Counterexamples are kernels such as the exponential kernel $\exp(<x,y>)$ linear kernels or even the constant kernel $K(x,y)=c$.
If you are interested in interpolation and approximation errors, machine learning literature is probably not the right place for you. Look at books such as Wendland or the lecture notes by Schaback.