0

I came across this concept of GCV optimization (new to me) for tuning hyperparameters in a model, as an alternative to maximizing the maginal likelihood (MML) of the output, which is what I am used to doing. I see people applying it to Bayesian-related methods (regularization), though it definitely yields an empirical Bayes method. The concept is well-explained in e.g. this question (along with links therein) or the book by Hastie et. al. (p. 244), but only when the prior information (which is what any regularizer incorporates) has zero mean, i.e. the estimators have the form
$$ \hat y = Sy $$ where $S = S(X, \sigma^2, \eta)$ (denoting any hyperparameters as $\eta$), whereas I am interested in the case where $$ \hat y = Sy + R$$ where $R = R(X, \sigma^2, m, \eta)$ ($m$ is some prior information, causing this issue, see example below).

Q: Is it possible to still use some version of the GCV in this scenario? Did anyone describe something related? Or is there something fishy with this (apart from the standard arguments against EB)?

Example: If we assume a model $y = X\theta$, where measurements are assumed to be perturbed by i.i.d. Gaussian noise of variance $\sigma^2$, and that there is a prior on $\theta$, being $$ p(\theta) \sim N(m, V) $$ Then the posterior is $$ p(\theta | y) \sim N(m^*, V^*) $$ where $V^* = (X^TX /\sigma^2 + V^{-1})^{-1}, m^* = V^*(X^Ty/\sigma^2 + V^{-1}m)$. Thus if we select $m^*$ (the MAP estimate) as the estimator, then $$ \hat y = X m^* = Sy + R \;,\quad R = V^* V^{-1} m $$ This does not occur when $m=0$, which is usually the case in standard regularizers, and the hyperparameters are elements in $V$.


(GCV where I discovered it first: "Gray-box regularized fir modeling for linear system identification" by T Peter, O Nelles)

0 Answers0