What is the distribution of the predictions in linear regression?

Question

As the question states, what is the distribution of the predicted $y$ values in linear regression? I'm sure this question has been answered somewhere before but I can't seem to find it for some reason.

Edit: forgot to mention that I'm making standard homoskedastic normal residual assumptions.

Welcome to Cross Validated! Do you mean the conditional distribution or the pooled (marginal) distribution of all predictions combined together? Also, you’ve tagged this with [tag:normal-distribution]. Why? The normal distribution could come up in this discussion but does not have to. — Dave, Oct 04 '23 at 11:40
Edited post - forgot to mention assumptions. I'm looking for the conditional distribution given $X$, if that's what you mean. — mrepic1123, Oct 04 '23 at 12:14
By linear regression, do you mean a maximum likelihood model where you assume that the conditional distribution of $y$ given $X$ is Normal? — Firebug, Oct 04 '23 at 12:37
@Firebug, maximum likelihood is a type of estimator, not a model. — Richard Hardy, Oct 05 '23 at 12:16
@RichardHardy by 'maximum likelihood model' I refer to models that are estimated using maximum likelihood. This is quite commonplace as far as I'm aware. — Firebug, Oct 05 '23 at 12:35
@Firebug, this sounds fine like a shorthand to be used by experts that know the context, but taken literally it can be confusing for beginners – that's all. — Richard Hardy, Oct 05 '23 at 12:42

score 8 · Accepted Answer · edited Oct 05 '23 at 06:43

8

Assume a traditional linear regression and that the data follow the model, i.e, given $X$ the corresponding value $Y|X$ will be distributed as $\mathcal{N}(X \beta, \sigma^2)$ for some true values of $\beta$ and $\sigma$. In this case, the least squares estimator (and the MLE) for $\beta$, $\hat{\beta} = (X^\top X)^{-} X^\top Y$ is just a linear transformation of multivariate normal random variables $Y$ and thus has the distribution $\mathcal{N}(\beta, (X^\top X)^{-} \sigma^2)$. (See this answer)

Then the prediction for a new $X_j$ is given by $\hat Y_{j} = X_j (X^\top X)^{-} X^\top Y $ which again is a linear transformation of a multivariate normal so

$$\hat Y_{j}| X_j \sim \mathcal{N}(X_j\beta, X_j(X^\top X)^{-} X_j^\top \sigma^2)$$

This neglects any uncertainty about the predictors $X$ and $X_j$ and what values they will have and treats them as deterministic quantities. If you assume some distribution for $X$ and $X_j$ then the distribution of $\hat{\beta}$ and thus $\hat Y_{j} | X_j$ will also change.

edited Oct 05 '23 at 06:43

Richard Hardy

67,272

answered Oct 04 '23 at 12:37

user9794

216

3

While correct, this perspective does not seem to be practically relevant. In regression applications, we do not have the true values $\beta$ but only the estimates $\hat\beta$. That makes the distribution of the predictions much more involved. Stephan Kolassa's answer goes in that direction. – Richard Hardy Oct 05 '23 at 06:44
In my context I do have the true values $\beta$ so this was the answer I was looking for. – mrepic1123 Oct 05 '23 at 15:30
@mrepicfoulgermrepic1123, if you know $\beta$ and use $X_j \beta$ instead of $X_j \hat{\beta}$, then the predictions are just deterministic and not normally distributed. – user9794 Oct 05 '23 at 16:38

score 6 · Answer 2 · answered Oct 04 '23 at 11:34

6

The precise answer will depend on whether your errors are normally distributed (and homoskedastic, and independent). If so, future observations follow a t distribution, see section 3.5 in Faraway (2002), "prediction of a future value", although I would have called the intervals "prediction intervals" rather than "confidence intervals".

If your errors are non-normal, you can often assume a t or normal distribution by arguing asymptotics, but I don't have a reference at hand.

answered Oct 04 '23 at 11:34

Stephan Kolassa

123,354

The predictions are t-distributed? – Dave Oct 04 '23 at 11:44
Edited post - forgot to mention assumptions. – mrepic1123 Oct 04 '23 at 12:15
1

This does not give t-distributed predictions: set.seed(2023); N <- 1000; x <- rbinom(N, 1, 0.5); y <- 7*x + rnorm(N); plot(density(y)). The predictions are bimodal. I think you are addressing a different question. – Dave Oct 04 '23 at 14:02
1

As I read the question it is just about the sampling distribution of $\hat{Y}_j$ in relation to the true parameters. The more relevant/interesting deviation $Y_j - \hat{Y}_j$ follows a t-distribution when assuming homoskedastic residuals with linear regression, so in this sense one could say the predictions follow a t-distribution centered on the hypothetical "actual" value of $Y_j$ (which is a random variable). – user9794 Oct 04 '23 at 18:12
3

@Dave: prediction intervals in standard linear regression, as asked in the question, usually assume a t distribution, conditional on predictors, which I assumed was intended. I don't quite understand what your example should be telling us, to be honest. Yes, unconditional predictions may be bimodal. But how often are we interested in unconditional predictions - especially if we don't know anything about the distribution of the predictors? – Stephan Kolassa Oct 04 '23 at 20:37
Reading this answer and taking it literally, I share Dave's confusion. At the same time I also understand Stephan's explanation in the comment. For clarity, why not make the conditioning explicit in the answer? – Richard Hardy Oct 05 '23 at 06:49
2

The distribution of estimates and predictions is Gaussian. But for the computation of confidence intervals or prediction intervals we use a t-distribution. The question asks for the former. – Sextus Empiricus Oct 05 '23 at 09:09

What is the distribution of the predictions in linear regression?

2 Answers2

Linked