0

Let say I have a regression equation $Y = \beta_0+\beta_1x_1+\beta_2x_2+\epsilon$. Estimate of the response value can be obtained as $\hat{Y} = \hat{\beta_0}+\hat{\beta_1}X_2+\hat{\beta_2}X_2$. And I also build a 95% confidence interval for $\hat{Y}$.

I try to understand how should I interpret that confidence interval for $\hat{Y}$? Is the statement correct : If I get 100 pairs of values for $\left(x_1, x_2\right)$ and use above equation to re-estimate $\hat{Y}$ then for 95 cases I should expect my new estimates of $\hat{Y}$ will lie within above confidence interval, assuming the model is valid?

If above statement is correct then, how exactly I can build an interval which you tell me where a new observation would lie between?

  • "An interval ... where a new observation would lie" is a prediction interval.. For the first part of your question, it's unclear what you are describing, because "above question" does not "re-estimate" anything: it simply applies the estimates $\hat\beta_i$ to explanatory variables. Be wary, then, of any answer that does not offer a specific, clear statement of how it interprets what "get ... values" and "re-estimate" mean. – whuber Mar 15 '22 at 21:16
  • Thanks. Then, what you think the correct interpretation of the estimated confidence interval of $\hat{Y}$ – Brian Smith Mar 15 '22 at 21:26
  • It's perfectly standard: for fixed $(x_1,x_2),$ the linear combination $\beta_0+\beta_1x_1+\beta_2x_2$ is a parameter. A confidence interval procedure is supposed to produce an interval with at least a specified $100(1-\alpha)%$ chance of covering that parameter, regardless of the (unknown) values of the $\beta_i$ and $\operatorname{Var}(\epsilon).$ https://stats.stackexchange.com/questions/26450 is a good reference. – whuber Mar 15 '22 at 21:44

1 Answers1

0

This link has a good explanation of confidence vs. prediction intervals.

Note that confidence intervals are made for real-valued population parameters, but $\hat{Y}$ is not a population parameter. (It's a random vector whose $i^{th}$ component is an estimator of $\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\epsilon_i$). So, one does not make a confidence interval for $\hat{Y}$ but rather for what the estimator $\hat{Y}$ is estimating, namely $\mu_{Y_i|\boldsymbol{x}_i=\boldsymbol{x}_0}$.

Suppose we have observed $(\boldsymbol{x}_0, y_0)$ and constructed $[L(\boldsymbol{x}_0), U(\boldsymbol{x}_0)]$ as a 95% confidence interval for $\mu_{Y_i|\boldsymbol{x}_i=\boldsymbol{x}_0}$. We can say that we are 95% confident that $[L(\boldsymbol{x}_0), U(\boldsymbol{x}_0)]$ covers $\mu_{Y_i|\boldsymbol{x}_i=\boldsymbol{x}_0}$.

On the other hand, a prediction interval will allow you to predict $Y_0$ at $\boldsymbol{x}=\boldsymbol{x}_0$.

Sources:

  1. Statistical Inference by George Casella & Roger L. Berger
  2. Modern Multivariate Statistical Techniques by Alan Julian Izenman
  • Your assertion that $\hat Y$ is not a parameter appears to rest on an unstated assumption that the $x_i$ are random variables. That makes this exposition potentially confusing. Your final conclusion is especially confusing, because prediction intervals are related to ... predictions, not "estimates." – whuber Mar 16 '22 at 12:47
  • @whuber. You probably know more about it than me, but I was thinking that $\hat{Y}$ is an estimator and that all estimators are random elements. I agree with you that $E(\hat{Y})$ is a population parameter and non-random. I think that $\boldsymbol{x}$ can be fixed and non-random and $\hat{Y}$ can get its randomness from $\boldsymbol{\hat{\beta}}$. Looking at your post here, I agree that I missed the right wording on my final conclusion. – Escherichia Mar 16 '22 at 15:08