Calculating prediction interval from data available in publication

Question

I have a publication that provides a linear multiple regression equation from a meta-analysis. LOOCV was used to select the explanatory variables included in a final model from all possible models. A regression equation is provided in the publication. I want to calculate $\hat{y}$ for a new set of values and estimate the prediction intervals for those values.

The problem I am having is that the publication has, to me, limited information. It has the following with respect to the LOOCV analysis;

RMSE and mean bias of the final model selected
Estimates and SE of the explanatory variables selected for the final model
Summary stats of the dataset used for each explanatory variable

The self-answer in this post suggests you can just use RMSE to approximate the prediction interval. The prediction I get when running a simulation and comparing the RMSE calculation method with predict.lm shows they are not the same so I would prefer to use the RMSE method.

I would prefer to use $$\hat{y}_h \pm t_{(\alpha/2, n-p)} \times \sqrt{1 + \mathbf{x}^* (\mathbf{X}'\mathbf{X})^{-1} (\mathbf{x}^*)'}$$

where $\mathbf{x}^*$ represents a matrix of variables used to populate the regression equation however I don't have the original dataset to calculate $(\mathbf{X}'\mathbf{X})$.

Am I missing something and I can use this equation with the information I have by making some calculations? Or do I have to revert to the RMSE method?

TIA

You are missing $\hat\sigma$, i.e. RMSE, in your formula. plese check your code, because that is actually exactly how predict.lm works (See here: https://stats.stackexchange.com/a/604056/341520) — Lukas Lohse, Jan 04 '24 at 12:01
@LukasLohse Many thanks for pointing the error in the equation. Also, many thanks for pointing out that is how predict.lm works. Will go back and check the code as suggested. — Aaron Simmons, Jan 04 '24 at 21:02

Calculating prediction interval from data available in publication

0 Answers0