0

I have a publication that provides a linear multiple regression equation from a meta-analysis. LOOCV was used to select the explanatory variables included in a final model from all possible models. A regression equation is provided in the publication. I want to calculate $\hat{y}$ for a new set of values and estimate the prediction intervals for those values.

The problem I am having is that the publication has, to me, limited information. It has the following with respect to the LOOCV analysis;

  • RMSE and mean bias of the final model selected
  • Estimates and SE of the explanatory variables selected for the final model
  • Summary stats of the dataset used for each explanatory variable

The self-answer in this post suggests you can just use RMSE to approximate the prediction interval. The prediction I get when running a simulation and comparing the RMSE calculation method with predict.lm shows they are not the same so I would prefer to use the RMSE method.

I would prefer to use $$\hat{y}_h \pm t_{(\alpha/2, n-p)} \times \sqrt{1 + \mathbf{x}^* (\mathbf{X}'\mathbf{X})^{-1} (\mathbf{x}^*)'}$$

where $\mathbf{x}^*$ represents a matrix of variables used to populate the regression equation however I don't have the original dataset to calculate $(\mathbf{X}'\mathbf{X})$.

Am I missing something and I can use this equation with the information I have by making some calculations? Or do I have to revert to the RMSE method?

TIA

  • You are missing $\hat\sigma$, i.e. RMSE, in your formula. plese check your code, because that is actually exactly how predict.lm works (See here: https://stats.stackexchange.com/a/604056/341520) – Lukas Lohse Jan 04 '24 at 12:01
  • @LukasLohse Many thanks for pointing the error in the equation. Also, many thanks for pointing out that is how predict.lm works. Will go back and check the code as suggested. – Aaron Simmons Jan 04 '24 at 21:02

0 Answers0