2

I have trained a linear regression model based on some data. Now I have new data, and I need to find a way to calculate the CIs for each $Y_{new}$ sample. Meaning now I have n samples in the features $X_{new}$ and the same n samples in the labels $Y_{new}$.

I was able to calculate the CIs for $E[Y_{new}|X_{new}]$ for each point in $Y_{new}$ (so overall n confidence intervals), using percentiles bootstrap. For each iteration of the bootstrap, I resample the data, fit a linear regression model on the resampled data, then predict $Y_{new}$ for $X_{new}$ and collect the predictions in an array. Eventually I found the confidence intervals.

What about $Y_{new}$ using bootstrap? just an explenation of this would be really helpful no need for the implementation in code. Thanks you!

  • Because a CI is presumably a "confidence interval" and confidence intervals target parameters of distributions, yet each "new ... sample" is a random variable, it's unclear what you really want to compute. Could you describe your objective? – whuber Jan 11 '23 at 14:45
  • @whuber Sorry for not explaining my self properly. I think I found what I need here.. https://stats.stackexchange.com/questions/226565/bootstrap-prediction-interval – Programming Noob Jan 11 '23 at 15:13
  • @whuber I recently had a similar confusion with somebody (might even have been you). The nomenclature is really unfortunate because the term "confidence interval" does not in any way indicate that it would only apply to parameter estimates and not to predictions... – Eike P. Jan 11 '23 at 15:55
  • 1
    @Eike I have seen such confusion about the meaning of a CI on the Web, but never in a textbook: they are all clear that a CI targets a parameter and not another random variable. https://stats.stackexchange.com/questions/16493 is a good thread to look at. – whuber Jan 11 '23 at 20:29

0 Answers0