Prediction intervals for "single future response"?

Question

Faraway (2002, 39-41) states, "There are two kinds of predictions that can be made for a given $x_0$ ... Most times, we will want the first case which is called “prediction of a future value” while the second case, called “prediction of the mean response” is less common."

To get the interval for the less common second case in R, Faraway gives:

g <- lm(Species ~ Area+Elevation+Nearest+Scruz+Adjacent,data=gala)
predict(g,data.frame(Area=0.08,Elevation=93,Nearest=6.0,Scruz=12,Adjacent=0.34),se=T)
# Width of mean response interval ($fit - $se.fit, $fit + $se.fit)

How can we calculate "the width of single future response interval" which Faraway does not give?

Are you asking "How would I get a regression prediction interval in R?" or are you asking the non-R question "How do I compute a prediction interval in regression?" If the first, why not just call the relevant R command instead of telling us what components you want to compute it with? If the second, why would R come into the question? — Glen_b, Mar 09 '15 at 05:17
If you just want a regression prediction interval in R see ?predict.lm, which explains what argument you need to change — Glen_b, Mar 09 '15 at 05:23

jtd · Accepted Answer · 2015-03-17T21:36:07.987

As Glen_b pointed out, the specifications of the predict.lm() package in R state the following, thus giving exactly what I wanted:

"The prediction intervals are for a single observation at each case in newdata (or by default, the data used for the fit) with error variance(s) pred.var. This can be a multiple of res.var, the estimated value of σ^2: the default is to assume that future observations have the same error variance as those used for fitting. If weights is supplied, the inverse of this is used as a scale factor. For a weighted fit, if the prediction is for the original data frame, weights defaults to the weights used for the model fit, with a warning since it might not be the intended result. If the fit was weighted and newdata is given, the default is to assume constant prediction variance, with a warning."

Also, the difference between what Faraway gives and does not give is merely an addition of 1 underneath the square root:

For a general answer from Faraway (2002, 39-41) on the (1) prediction interval for a single future response when given $x_0$:

$$ \hat{y}_0 \pm t^{(\alpha/2)}_{n-p} \hat\sigma \sqrt{1 + x^T_0 (X^TX)^{-1} x_0} $$

For a general answer from Faraway on the (2) prediction interval for the average of responses when given $x_0$:

$$ \hat{y}_0 \pm t^{(\alpha/2)}_{n-p} \hat\sigma \sqrt{x^T_0 (X^TX)^{-1} x_0} $$

I think that the above formulas look like the following in the simplest case of one predictor variable [please correct me if I have made an error in supposing this special case is equivalent to Faraway's general case above].

First (1):

$$ \hat{y}_i \pm t^{(\alpha/2)}_{{crit}_{(n-p)}}*(s_{y*x} \sqrt{1 + {1 \over n} + {(x_i - \bar x)^2 \over SS_x}}) $$

then (2):

$$ \hat{y}_i \pm t^{(\alpha/2)}_{{crit}_{(n-p)}}*(s_{y*x} \sqrt{ {1 \over n} + {(x_i - \bar x)^2 \over SS_x}}) $$

That second formula cannot possibly be correct, because it contains nothing that depends on the number of responses that have been averaged (which it obviously must). — whuber, Mar 17 '15 at 20:50
@whuber: Does the t-critical value term satisfy your comment with $(n-p)$? — jtd, Mar 17 '15 at 21:21
I don't think so. Your second formula does not look like a prediction interval at all. I believe you need to replace the "$1$" in the first formula by "$1/k$" where $k$ is the number of independent future responses involved in the average. Presumably the "$n$" is the amount of data and "$p$" is the number of explanatory variables, so neither of them provide any information about $k$. — whuber, Mar 17 '15 at 21:23
@whuber: Good point! I will suppose that Faraway suggests (2) means the "prediction of the true mean response among an (infinite) subpopulation with $x_0$" rather than the "mean of k responses with $x_0$". — jtd, Mar 17 '15 at 21:47
That sounds like a confidence interval rather than a prediction interval. Calling it a "prediction of a mean response" is strange. Statisticians, at least, tend to use "estimate" for any rational process of guessing a parameter (such as a true mean) and "predict" for guessing the value of a random variable (such as a future value). Asserting that a confidence interval procedure (which is taught in all introductory stats classes) is "less common" than a prediction interval (rarely taught) seems incorrect, so Faraway must have some narrow application (or restricted community) in mind. — whuber, Mar 17 '15 at 21:52

Prediction intervals for "single future response"?

1 Answers1