1

I am using linear model to do prediction, and I would like to calculate my prediction's prediction interval, which, when there is only one predictor, is

enter image description here

However, my model has three predictors. What is the formula for multivariate case?

I have been searching for a while, but I cannot find a formula for this.

Kevin Nowaczyk
  • 556
  • 4
  • 21
user398843
  • 123
  • 5

1 Answers1

3

What is denoted by $\mathrm{MSE}$ in your formula is the residual mean square which some people, confusingly, call error mean square or even mean squared error$-$hence the abbreviation in your formula. The residual mean square is an unbiased estimator of the (conditional) error or disturbance variance $\sigma^2$.
It is given by $$ \hat\sigma^2 = \frac{\sum_{i=1}^n (y_i-\hat y_i)^2}{n-k-1}, $$ where $k$ is the number of regressors. Note that we have $k=1$ in the simple linear regression model, and with three regressors (not counting the intercept) we have $k=3$.
The formula of the prediction interval for the future observation $y_h$ at location $\mathbf{x}_h={(1,x_{h1},\ldots,x_{hk})}^\top$ gets only slightly more complicated in the general case with $k$ regressors: $$ \hat y_h\mp t_{(1-\alpha/2,\,n-k-1)}\times\sqrt{\hat{\sigma}^2\times\left(1+\mathbf{x}_h^{\top}\left(\mathbf{X}^{\top}\mathbf{X}\right)^{-1}\mathbf{x}_h\right)}, $$
where $\hat y_h =\mathbf{x}_h^\top \hat{\boldsymbol{\beta}}$, $\mathbf{X}$ is the design matrix, and $t_{(1-\alpha/2,\,n-k-1)}$ the $(1-\alpha/2)$ quantile of the $t$-distribution with $n-k-1$ degrees of freedom.
For $k=1$ this reduces to the formula in your question if we identify the second entry of $\mathbf{x}_h={(1,x_{h1})}^\top$, i.e. $x_{h1}$, with $x_h$.

statmerkur
  • 5,950
  • Thank you for your answer. Should we add $1/n$ in the square root on your second equation? – user398843 Aug 16 '22 at 14:38
  • 1
    @user398843 No, $\sigma^2\left(x_h^{\top}\left(X^{\top}X\right)^{-1}x_h\right)$ is the variance of $\hat y_h$ and reduces to $\sigma^2 \left(1/n + \left(x_h-\bar x \right)^2 / \sum_{i=1}^n \left( x_i-\bar x\right)^2 \right)$ in the case of only one regressor. – statmerkur Aug 16 '22 at 15:37
  • And what if k >n? – PascalIv Jul 03 '23 at 14:52
  • 1
    @PascalIv Then $\mathbf X$ doesn't have full column rank and $\hat{\boldsymbol{\beta}}$ is not uniquely defined. – statmerkur Jul 09 '23 at 01:52