Suppose you perform cross-validation to obtain an optimal value for some vector of hyperparameters $\lambda$.
You ultimately want to predict some new observations $y_\mathrm{query}|X_\mathrm{query}$.
It seems that you have at least three choices on how to proceed:
- Estimate the model parameters $\hat\theta_i$ on each cross-validation training sample $i=1,\dots,n$, with the optimal $\lambda$, then average these values to obtain a final estimate, $\hat{\bar\theta}:=\frac{1}{n}\sum_{i=1}^n{\hat\theta_i}$. Use these averaged estimates $\hat{\bar\theta}$ to perform the required prediction, $\hat y_{\mathrm{query},\hat{\bar\theta}}:=\mathbb{E}[y_\mathrm{query}|X_\mathrm{query},\theta=\hat{\bar\theta}]$.
- Estimate the required predictions $\hat y_{\mathrm{query},i}$ on each cross-validation training sample $i=1,\dots,n$, with the optimal $\lambda$, then average these values to obtain a final prediction, $\hat {\bar y}_\mathrm{query}:=\frac{1}{n}\sum_{i=1}^n{\hat y_{\mathrm{query},i}}$.
- Using the optimal $\lambda$, re-estimate the model on the entire sample, to obtain $\hat\theta_*$. Use these parameters to perform the required prediction, $\hat y_{\mathrm{query},\hat\theta_*}:=\mathbb{E}[y_\mathrm{query}|X_\mathrm{query},\theta=\hat\theta_*]$.
Which of these methods is most common? What are their advantages and disadvantages?