Literature on GP with kernels having no closed form

Question

I have to use GP regression on a complex time series, and the kernel function is not known in closed form. I have found a numerical approximation with the Gauss-Laugerre quadrature. It takes the following form:

$$ C(\tau) = C(t_1 - t_2) = \exp(j C_1 \tau) \sum_{i = 1}^n w_i(\eta) f(x_i, \tau, \Lambda) , $$

where $\Theta = [\eta, \Lambda]$ are my hyperparameters, $w_i$ are the weights of the generalized Gauss-Laugerre polynomial, $x_i$ are the zeros generalized Gauss-Laugerre polynomial of order $n$, and $j = \sqrt{-1}$.

Is there any literature about estimating hyper-parameters when the covariance function is not a direct function of the parameters but has a numerical intractable form with hidden hyper-parameters?

EDIT:

I have a similar question here. In my case, the log-likelihood is extremely smooth, having almost a gradient of zero for a wide range of parameter space. I believe that the parameters are not too strongly affecting the log-likelihood and can be verified by my model. That is why I also tried to formulate the same problem with a GP this time.

It's not clear what you're missing by not having a closed form expression for your kernel: what are you looking for in particular? If you plan to do inference on hyperparams via MCMC, it doesn't care at all whether your kernel function is closed form or not, so long as you can compute a marginal log lik estimate. If you are planning to use first order optimization to estimate them, you just need a way to differentiate your estimate to get an estimate of the gradient too, or alternatively to directly form an estimate of the gradient via e.g. the adjoint method. — John Madden, Jan 17 '24 at 16:52
@JohnMadden The problem I face is that the log-likelihood is extremely smooth although it has a maximum. I suspect that the log likelihood is not very strongly affected by the parameters. Due to the extreme smoothness my estimate variance is high if i use MLE. If i want to use MCMC on a very smooth log-likelihood, wouldn’t it be very biased towards my prior (if it’s peaky) ? I tried MCMC in fact and it seems like it’s very biased towards my prior. — CfourPiO, Jan 17 '24 at 23:59
@JohnMadden I’ve posted another question explaining a similar log likelihood here https://stats.stackexchange.com/questions/636052/how-to-deal-with-extremely-smooth-plateau-log-likelihood?noredirect=1#comment1189654_636052 . I wanted to solve the same problem but with a GP this time. — CfourPiO, Jan 18 '24 at 00:09
im a little concerned that what you have just told us is not in your question... but anyways there are a couple issues here. most notably, you have it backwards: a very tight likelihood as youve shown in your other question, is equivalent to very low posterior variance and very low prior influence. You report a high "estimate variance", but if your situation is like that shown in your other question, the reason for high variance is probably optimization issues rather than estimator variance. In short: it sounds like you have a difficult problem, and probably want dedicated help rather than CV. — John Madden, Jan 18 '24 at 02:57
@JohnMadden I edited the question with this detail. Thank you for sharing your thoughts. I think you are right in pointing out that it could be the optimizer that is inefficient. Now, I am using a quasi Newton type optimizer with active-set algorithm. I use lbfgs Hessian to make it faster (In MATLAB). What I observe is that it takes some time to optimize. I didn't understand what you meant by "CV" in this case. Do you also recommend me to use any other kind of optimizer? — CfourPiO, Jan 18 '24 at 08:35
Im assuming you mean gradients instead of Hessian, is that right? By CV, I am referring to this website. To me, your problem sounds hard, and I'm suggesting that you would benefit from having one on one time with someone proficient in numerical analysis; I'm suggesting that this is beyond what we might expect folks on this website to be able to help with in passing. Also, no matter what smooth optimizer (aka thing sorta like BFGS) you use, it's going to struggle if the gradient is numerically 0 over so much of the space. — John Madden, Jan 19 '24 at 23:18
@JohnMadden In the MATLAB optimization toolbox, I set the Hessian to something called L-BFGS, so I assumed it computes the Hessian with this algorithm. For the gradient, it just says a numeric finite difference gradient. Sometimes I choose my own gradient that I derived. I agree with the fact that I need to discuss it in detail. I discussed this with my PhD supervisor, and I formulated a step-wise approach. I think it works out for my expectations now. However, I need to do a Monte Carlo simulation to see thee variance in the optimization. — CfourPiO, Jan 21 '24 at 01:42
The stepwise algorithm I follow is this: I optimize by fixing one parameter and perform this for different values of the fixed parameter. Then, I decide bounds based on these optimizations and then I run a full parameter optimization. — CfourPiO, Jan 21 '24 at 01:45

Literature on GP with kernels having no closed form

0 Answers0