0

I can see from here that prediction interval for a new response Y (setting is simple linear regression) is

enter image description here

However I've read here that

enter image description here

Apparently, no calculation related to x is needed according to this last formula. I would like to know if I can use this formula to calculate prediction intervals and if it is based on some hypothesis to be valid.

Patrick
  • 393
  • 2
  • 9
  • "yhat" refers to the value predicted by the model when the explanatory variable is set to "x". – whuber Aug 02 '22 at 13:23
  • I know. The part I don't know is how I can calculate the expression involving x as shown in the first formula? – Patrick Aug 02 '22 at 19:52
  • Am I correct to say that the second formula assumes that the "x" part is negligible (as well as the 1/n) ? – Patrick Aug 02 '22 at 20:02
  • No. It's just a sloppy formula. At the very least it should write "$\hat y(x)$" instead of "$\hat y$" ("yhat"). – whuber Aug 02 '22 at 21:10
  • I understand but how can I calculate the "x" part, the sum of squared deviations of x from mean(x) ? – Patrick Aug 03 '22 at 12:59
  • ?? The formula is explicit. – whuber Aug 03 '22 at 13:27
  • Sorry if it was not clear in my post but from what I understand of the "sloppy" formula is that we don't have the x's, only the y's (output of the model: ground truth and estimation). So my question was: without the x's, how can one calculate the 'x' part ? – Patrick Aug 03 '22 at 14:48
  • You can't. Indeed, that statistic (the variance of the x's) is usually not included in publications. – whuber Aug 03 '22 at 15:53
  • I don't get it. Are you saying that the first formula shown above for a prediction interval includes a part involving x but usually we don't include it because it's negligible? I cannot find a place which does not include it (ex: https://stats.stackexchange.com/questions/16493/difference-between-confidence-intervals-and-prediction-intervals) – Patrick Aug 04 '22 at 13:31
  • 2
    I am not saying anything like that. The first formula is clear and correct. The second formula is a vague shorthand intended to be equivalent to the first. In order to apply either, you need the equivalent of the prediction point $x_h,$ the mean of the data values $\bar x,$ the sum of squares of the residuals $\sum_i(x_i-\bar x)^2,$ the data count $n,$ the MSE, and the confidence $1-\alpha.$ None of those can be determined from the others. – whuber Aug 04 '22 at 13:44

0 Answers0