I'm learning about Gaussian process functional regression but the more I learn, the more questions arise.
So we want to estimate a function $f\left( x \right)$ from data $D = \left( {\left( {{x_1},{y_1}} \right),...,\left( {{x_n},{y_n}} \right)} \right)$ by updating a ${\text{GP}}\left( {m\left( x \right),k\left( {x,x'} \right)} \right)$ prior.
I tell to myself that GP regression should fulfill the "consistency condition" (don't know how it's called in the literature?): if we partition the data set $D$, update the ${\text{GP}}\left( {m\left( x \right),k\left( {x,x'} \right)} \right)$ prior with the first part, then update the posterior process of the previous step with the second part and so forth until the last part, we should get the same final posterior process at the end as if we directly update ${\text{GP}}\left( {m\left( x \right),k\left( {x,x'} \right)} \right)$ with whole data $D$. Otherwise, we would get different estimations/inferences, depending on said partition.
On the one hand, it is well-known that GP regression has $O\left( {{n^3}} \right)$ generic computational complexity.
On the other hand, if we successively update with each datum $\left( {{x_i},{y_i}} \right),\;i = 1,n$ one by one, the update equations are
$\left\{ \begin{gathered} {m^{i + 1}}(x) = {m^i}(x) + {k^i}\left( {x,{x_i}} \right){\left( {{k^i}\left( {{x_i},{x_i}} \right) + {\sigma ^2}} \right)^{ - 1}}\left( {{y_i} - {m^i}\left( {{x_i}} \right)} \right) \hfill \\ {k^{i + 1}}(x,x') = {k^i}(x,x') - {k^i}\left( {x,{x_i}} \right){\left( {{k^i}\left( {{x_i},{x_i}} \right) + {\sigma ^2}} \right)^{ - 1}}{k^i}\left( {{x_i},x'} \right) \hfill \\ \end{gathered} \right.$
Each recursion requires the evaluation of 4 terms.
Hence, applying the recursion again to those 4 terms requires the evaluation of 16 terms, 8 of them being different for $m(x)$: ${{m^{i-1}}\left( {{x}} \right)}$, ${{m^{i-1}}\left( {{x_{i-1}}} \right)}$ , ${{m^{i-1}}\left( {{x_{i}}} \right)}$, ${k^{i-1}}\left( {x,{x_{i-1}}} \right)$, ${k^{i-1}}\left( {x_{i-1},{x_{i-1}}} \right)$, ${k^{i-1}}\left( {x,{x_{i}}} \right)$, ${k^{i-1}}\left( {x_{i-1},{x_{i}}} \right)$ and ${k^{i-1}}\left( {x_{i},{x_{i}}} \right)$.
And so forth. It appears that updating the kernel sequentially with each datum one by one has exponential computational complexity in $n$!
Therefore, GP regression doesn't fulfill the consistency condition: it gives completely different results (or no result at all) depending on how the data is partitioned.
Correct?