This question has been asked before but I'd like to come back to it because I point a precise issue out.
Suppose we want to estimate a function $f\left( x \right)$ from data $D = \left( {\left( {{x_1},{y_1}} \right),...,\left( {{x_n},{y_n}} \right)} \right)$ with ${y_i} = f\left( {{x_i}} \right) + {\xi _i}$, ${\xi _i}\mathop \sim \limits^{{\text{i}}{\text{.i}}{\text{.d}}{\text{.}}} \mathcal{N}\left( {0,{\sigma ^2}} \right)$ by Gaussian process functional regression.
Let $X = \left( {{x_1},...,{x_n}} \right)$ and $Y = \left( {{y_1},...,{y_n}} \right)$.
The likelihood is $p\left( {\left. Y \right|X , f , \sigma } \right) \propto {\sigma ^{ - n}}\prod\limits_{i = 1}^n {{e^{ - \frac{{{{\left( {{y_i} - f({x_i})} \right)}^2}}}{{2{\sigma ^2}}}}}} $
We have a ${\text{GP}}\left( {m\left( x \right),k\left( {x,x'} \right)} \right)$ prior on $f$ with hyperparameters $m$ and $k$.
Generally speaking, we can have hyperhyperparameters ${\rm M}$ and ${\rm K}$ for $m\left( x \right)$ and $k\left( {x,x'} \right)$.
Therefore Bayes rule writes
$ p\left( {\left. {f , \sigma , m , k ,{\rm M} , {\rm K}} \right|X , Y} \right) \propto \\ {\sigma ^{ - n}}\prod\limits_{i = 1}^n {{e^{ - \frac{{{{\left( {{y_i} - {x_i}} \right)}^2}}}{{2{\sigma ^2}}}}}} {\text{GP}}\left( {m\left( x \right),k\left( {x,x'} \right)} \right)p\left( {\left. m \right|{\rm M}} \right)p\left( {\left. k \right|{\rm K}} \right)p\left( {\rm M} \right)p\left( {\rm K} \right)p\left( \sigma \right) \\ $
and that's all we should need.
The problem is that we have something more that does not appear in Bayes rule at this point: the ${\text{GP}}\left( {m\left( x \right),k\left( {x,x'} \right)} \right)$ prior is used to assign the probability distribution $\mathcal{N}\left( {m\left( X \right),k\left( {X,X} \right)} \right)$ of r.v. $\left. {f\left( X \right)} \right|X,m,k$ that is used in the update equations to get the posterior Gaussian process.
It seems there is no place for $f(X)|X, m , k$ in Bayes rule because we can only add hyperparameters and $f\left( X \right)$ is not such an hyperparameter but a function of the piece of data $X$. $f(X)|X, m ,k$ doesn't look like a prior because it is conditional on $X$ and depends on $X$, nor a likelihood because the likelihood is ${\left. X , Y \right|f , \sigma }$ nor a posterior because all of them are conditional on $X , Y$.
How to plug $f(X)|X, m , k$ in Bayes rule above and where, given we can only add hyperparameters? If we can't, where does ${\mathcal{N}}\left( {m\left( X \right),k\left( {X,X} \right)} \right)$ come from?
So my question is: can we find a set/logical conjunction of hyperparameters that makes the ${\mathcal{N}}\left( {m\left( X \right),k\left( {X,X} \right)} \right)$ multivariate Gaussian appears somewhere in Bayes rule?