1

As the question states, what is the distribution of the predicted $y$ values in linear regression? I'm sure this question has been answered somewhere before but I can't seem to find it for some reason.

Edit: forgot to mention that I'm making standard homoskedastic normal residual assumptions.

mrepic1123
  • 489
  • 6
  • 3
    Welcome to Cross Validated! Do you mean the conditional distribution or the pooled (marginal) distribution of all predictions combined together? Also, you’ve tagged this with [tag:normal-distribution]. Why? The normal distribution could come up in this discussion but does not have to. – Dave Oct 04 '23 at 11:40
  • Edited post - forgot to mention assumptions. I'm looking for the conditional distribution given $X$, if that's what you mean. – mrepic1123 Oct 04 '23 at 12:14
  • By linear regression, do you mean a maximum likelihood model where you assume that the conditional distribution of $y$ given $X$ is Normal? – Firebug Oct 04 '23 at 12:37
  • @Firebug, maximum likelihood is a type of estimator, not a model. – Richard Hardy Oct 05 '23 at 12:16
  • @RichardHardy by 'maximum likelihood model' I refer to models that are estimated using maximum likelihood. This is quite commonplace as far as I'm aware. – Firebug Oct 05 '23 at 12:35
  • 1
    @Firebug, this sounds fine like a shorthand to be used by experts that know the context, but taken literally it can be confusing for beginners – that's all. – Richard Hardy Oct 05 '23 at 12:42

2 Answers2

8

Assume a traditional linear regression and that the data follow the model, i.e, given $X$ the corresponding value $Y|X$ will be distributed as $\mathcal{N}(X \beta, \sigma^2)$ for some true values of $\beta$ and $\sigma$. In this case, the least squares estimator (and the MLE) for $\beta$, $\hat{\beta} = (X^\top X)^{-} X^\top Y$ is just a linear transformation of multivariate normal random variables $Y$ and thus has the distribution $\mathcal{N}(\beta, (X^\top X)^{-} \sigma^2)$. (See this answer)

Then the prediction for a new $X_j$ is given by $\hat Y_{j} = X_j (X^\top X)^{-} X^\top Y $ which again is a linear transformation of a multivariate normal so

$$\hat Y_{j}| X_j \sim \mathcal{N}(X_j\beta, X_j(X^\top X)^{-} X_j^\top \sigma^2)$$

This neglects any uncertainty about the predictors $X$ and $X_j$ and what values they will have and treats them as deterministic quantities. If you assume some distribution for $X$ and $X_j$ then the distribution of $\hat{\beta}$ and thus $\hat Y_{j} | X_j$ will also change.

Richard Hardy
  • 67,272
user9794
  • 216
  • 3
    While correct, this perspective does not seem to be practically relevant. In regression applications, we do not have the true values $\beta$ but only the estimates $\hat\beta$. That makes the distribution of the predictions much more involved. Stephan Kolassa's answer goes in that direction. – Richard Hardy Oct 05 '23 at 06:44
  • In my context I do have the true values $\beta$ so this was the answer I was looking for. – mrepic1123 Oct 05 '23 at 15:30
  • @mrepicfoulgermrepic1123, if you know $\beta$ and use $X_j \beta$ instead of $X_j \hat{\beta}$, then the predictions are just deterministic and not normally distributed. – user9794 Oct 05 '23 at 16:38
6

The precise answer will depend on whether your errors are normally distributed (and homoskedastic, and independent). If so, future observations follow a t distribution, see section 3.5 in Faraway (2002), "prediction of a future value", although I would have called the intervals "prediction intervals" rather than "confidence intervals".

If your errors are non-normal, you can often assume a t or normal distribution by arguing asymptotics, but I don't have a reference at hand.

Stephan Kolassa
  • 123,354
  • The predictions are t-distributed? – Dave Oct 04 '23 at 11:44
  • Edited post - forgot to mention assumptions. – mrepic1123 Oct 04 '23 at 12:15
  • 1
    This does not give t-distributed predictions: set.seed(2023); N <- 1000; x <- rbinom(N, 1, 0.5); y <- 7*x + rnorm(N); plot(density(y)). The predictions are bimodal. I think you are addressing a different question. – Dave Oct 04 '23 at 14:02
  • 1
    As I read the question it is just about the sampling distribution of $\hat{Y}_j$ in relation to the true parameters. The more relevant/interesting deviation $Y_j - \hat{Y}_j$ follows a t-distribution when assuming homoskedastic residuals with linear regression, so in this sense one could say the predictions follow a t-distribution centered on the hypothetical "actual" value of $Y_j$ (which is a random variable). – user9794 Oct 04 '23 at 18:12
  • 3
    @Dave: prediction intervals in standard linear regression, as asked in the question, usually assume a t distribution, conditional on predictors, which I assumed was intended. I don't quite understand what your example should be telling us, to be honest. Yes, unconditional predictions may be bimodal. But how often are we interested in unconditional predictions - especially if we don't know anything about the distribution of the predictors? – Stephan Kolassa Oct 04 '23 at 20:37
  • Reading this answer and taking it literally, I share Dave's confusion. At the same time I also understand Stephan's explanation in the comment. For clarity, why not make the conditioning explicit in the answer? – Richard Hardy Oct 05 '23 at 06:49
  • 2
    The distribution of estimates and predictions is Gaussian. But for the computation of confidence intervals or prediction intervals we use a t-distribution. The question asks for the former. – Sextus Empiricus Oct 05 '23 at 09:09