1

I'm working on a situation (with computations made by someone not involved anymore), where we have a linear regression

enter image description here

and a confidence interval (of shape given in Shape of confidence interval for predicted values in linear regression)

$$y_0 = a + b x_0 \pm t_{1-\alpha/2; n-2} \ S \ \sqrt{\frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum_i (x_i - \bar{x})^2}}$$

with $$S = \sqrt{\frac{\sum_i (y_i - \bar{y})^2 - b^2 \sum_i (x_i - \bar{x})^2}{n-2}}$$

and, often, this value is considered:

$$ c = \ S \ \sqrt{\frac{1}{n} + \frac{(\bar{x})^2}{\sum_i (x_i - \bar{x})^2}} $$

How to interpret this value?

It seems to be equal to the distance between the intercept $a$ of the regression line, and the upper prediction band $y_0^+$, except that there is no $t_{1-\alpha/2; n-2}$ involved anymore.

How to interpret this quantity?

Basj
  • 498
  • 1
  • 4
  • 16
  • This quantity appears to be the prediction interval when the predictor, $x0$ is 0. – Demetri Pananos Oct 17 '23 at 16:19
  • @DemetriPananos Thank you. Just to be sure, then, why is $t_{1−α/2;n−2}$ absent of $c$ ? – Basj Oct 17 '23 at 16:48
  • Excuse me, this should be the combined uncertainty. The two sources of uncertainty are uncertainty in the outcome (i.e. the noise) and uncertainty in the mean. When the appropriate t statistic is applied, this can result in a prediction interval. – Demetri Pananos Oct 17 '23 at 17:23
  • @DemetriPananos I don't see the inclusion of residual variance. I think this is just std. error. It's easier to see when you take $x_0 = \bar{x}$, where you get $S\sqrt{1/n}$ – Lukas Lohse Oct 17 '23 at 17:27

1 Answers1

2

$c$ is the standard error for the estimation of $E[y|x = 0]$, i.e. the intercept. The upper end of your confidence interval should be $a + t_{1-\alpha/2, n-2}\cdot c$. Your prediction interval, i.e. the interval you would expect to contain new observations, should be even wider, replacing $c$, with $\sqrt{c^2 + S^2}$. See this thread for context: Why do the widths of confidence & prediction intervals change across a regression line - shouldn't it be the same with i.i.d?

Lukas Lohse
  • 2,482
  • Thank you @LukasLohse. How do you prove $c$ is the standard error for the estimation of $E[y|x=0]$? What's the formula for calculating the standard error in this specific case? – Basj Oct 28 '23 at 21:20
  • 1
    you simply take $E[y|x_0]\in \left[a + b x_0 \pm t_{1-\alpha/2; n-2} \ S \ \sqrt{\frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum_i (x_i - \bar{x})^2}}\right]$ and plug $x_0=0$ to get $E[y|x_0=0]\in \left[a + b\cdot0 \pm t_{1-\alpha/2; n-2} \ S \ \sqrt{\frac{1}{n} + \frac{(0- \bar{x})^2}{\sum_i (x_i - \bar{x})^2}}\right]= \left[a \pm t_{1-\alpha/2; n-2} \ S \ \sqrt{\frac{1}{n} + \frac{(\bar{x})^2}{\sum_i (x_i - \bar{x})^2}}\right]= \left[a \pm t_{1-\alpha/2; n-2} \ S \cdot c\right]$. – Spätzle Oct 30 '23 at 12:19
  • 1
    @Spätzle one $S$ to many in the last expression, but yeah my thoughts exactly. – Lukas Lohse Oct 30 '23 at 15:58
  • Yes @Spätzle, I understand that $E[y | x_0 = 0] = [a \pm t_{1-\alpha/2; n-2}\cdot c]$, but based on this, why is $c$ the standard error, and not $t_{1-\alpha/2; n-2} \cdot c$ instead? – Basj Nov 04 '23 at 13:21
  • It's just the definition. https://en.wikipedia.org/wiki/Standard_error – Lukas Lohse Nov 04 '23 at 14:15
  • I think your confusion might come from the fact that people write both confidence intervals and standard errors as $estimate \pm ...$. That is confusing you just kind of have to know which is being used. – Lukas Lohse Nov 04 '23 at 14:17