I will consider Non-Noisy Observations i.e. $y=f(x)$ Lets say we have the following data set of 5 training examples with one of the examples duplicated $(1,2,3,4,4)$ maps to $(2,4,6,8,8)$. Since for GPR we have to invert a Kernel Matrix and a Kernel matrix containing duplicate inputs will not be invertible we should remove duplicate training examples when doing GPR with non-noisy observation. Am I right in my reasoning ? Kindly comment
1 Answers
The duplicate data add no additional information, and rank-deficiency in the kernel matrix is fatal to the process. Removing them has literally no inferential consequence.
That said, numerically, the kernel matrix $K$ will occasionally become numerically singular if some points are too close together (but not necessarily identical). In this scenario, you can either identify and deal with the problem points (deletion, merging them, whatever) or you can some (small) noise: $\hat{K}=K+\epsilon I$. Usually $\epsilon=10^{-6}$ is sufficient for me, or you can perform a spectral decomposition of $K$ and then for each eigenvalue $\lambda_i$, replace it with $\hat{\lambda_i}=\max{\{\lambda_i, \epsilon\lambda_{\max}\}}$ for some small $\epsilon.$ The idea here is that you've effectively pinned the smallest eigenvalue of the matrix relative to the largest, and this may be a more "minimal" intervention into the matrix. This is an area where I'm not sure there are any good solutions.
The numerical component of the problem is considered in more detail on this thread:
Ill-conditioned covariance matrix in GP regression for Bayesian optimization
Anyway, thanks for confirmation that removing duplicate data is necessary...
– Tomas May 14 '20 at 19:58