As the question states, what is the distribution of the predicted $y$ values in linear regression? I'm sure this question has been answered somewhere before but I can't seem to find it for some reason.
Edit: forgot to mention that I'm making standard homoskedastic normal residual assumptions.
- 489
- 6
-
3Welcome to Cross Validated! Do you mean the conditional distribution or the pooled (marginal) distribution of all predictions combined together? Also, you’ve tagged this with [tag:normal-distribution]. Why? The normal distribution could come up in this discussion but does not have to. – Dave Oct 04 '23 at 11:40
-
Edited post - forgot to mention assumptions. I'm looking for the conditional distribution given $X$, if that's what you mean. – mrepic1123 Oct 04 '23 at 12:14
-
By linear regression, do you mean a maximum likelihood model where you assume that the conditional distribution of $y$ given $X$ is Normal? – Firebug Oct 04 '23 at 12:37
-
@Firebug, maximum likelihood is a type of estimator, not a model. – Richard Hardy Oct 05 '23 at 12:16
-
@RichardHardy by 'maximum likelihood model' I refer to models that are estimated using maximum likelihood. This is quite commonplace as far as I'm aware. – Firebug Oct 05 '23 at 12:35
-
1@Firebug, this sounds fine like a shorthand to be used by experts that know the context, but taken literally it can be confusing for beginners – that's all. – Richard Hardy Oct 05 '23 at 12:42
2 Answers
Assume a traditional linear regression and that the data follow the model, i.e, given $X$ the corresponding value $Y|X$ will be distributed as $\mathcal{N}(X \beta, \sigma^2)$ for some true values of $\beta$ and $\sigma$. In this case, the least squares estimator (and the MLE) for $\beta$, $\hat{\beta} = (X^\top X)^{-} X^\top Y$ is just a linear transformation of multivariate normal random variables $Y$ and thus has the distribution $\mathcal{N}(\beta, (X^\top X)^{-} \sigma^2)$. (See this answer)
Then the prediction for a new $X_j$ is given by $\hat Y_{j} = X_j (X^\top X)^{-} X^\top Y $ which again is a linear transformation of a multivariate normal so
$$\hat Y_{j}| X_j \sim \mathcal{N}(X_j\beta, X_j(X^\top X)^{-} X_j^\top \sigma^2)$$
This neglects any uncertainty about the predictors $X$ and $X_j$ and what values they will have and treats them as deterministic quantities. If you assume some distribution for $X$ and $X_j$ then the distribution of $\hat{\beta}$ and thus $\hat Y_{j} | X_j$ will also change.
- 67,272
- 216
-
3While correct, this perspective does not seem to be practically relevant. In regression applications, we do not have the true values $\beta$ but only the estimates $\hat\beta$. That makes the distribution of the predictions much more involved. Stephan Kolassa's answer goes in that direction. – Richard Hardy Oct 05 '23 at 06:44
-
In my context I do have the true values $\beta$ so this was the answer I was looking for. – mrepic1123 Oct 05 '23 at 15:30
-
@mrepicfoulgermrepic1123, if you know $\beta$ and use $X_j \beta$ instead of $X_j \hat{\beta}$, then the predictions are just deterministic and not normally distributed. – user9794 Oct 05 '23 at 16:38
The precise answer will depend on whether your errors are normally distributed (and homoskedastic, and independent). If so, future observations follow a t distribution, see section 3.5 in Faraway (2002), "prediction of a future value", although I would have called the intervals "prediction intervals" rather than "confidence intervals".
If your errors are non-normal, you can often assume a t or normal distribution by arguing asymptotics, but I don't have a reference at hand.
- 123,354
-
-
-
1This does not give t-distributed predictions:
set.seed(2023); N <- 1000; x <- rbinom(N, 1, 0.5); y <- 7*x + rnorm(N); plot(density(y)). The predictions are bimodal. I think you are addressing a different question. – Dave Oct 04 '23 at 14:02 -
1As I read the question it is just about the sampling distribution of $\hat{Y}_j$ in relation to the true parameters. The more relevant/interesting deviation $Y_j - \hat{Y}_j$ follows a t-distribution when assuming homoskedastic residuals with linear regression, so in this sense one could say the predictions follow a t-distribution centered on the hypothetical "actual" value of $Y_j$ (which is a random variable). – user9794 Oct 04 '23 at 18:12
-
3@Dave: prediction intervals in standard linear regression, as asked in the question, usually assume a t distribution, conditional on predictors, which I assumed was intended. I don't quite understand what your example should be telling us, to be honest. Yes, unconditional predictions may be bimodal. But how often are we interested in unconditional predictions - especially if we don't know anything about the distribution of the predictors? – Stephan Kolassa Oct 04 '23 at 20:37
-
Reading this answer and taking it literally, I share Dave's confusion. At the same time I also understand Stephan's explanation in the comment. For clarity, why not make the conditioning explicit in the answer? – Richard Hardy Oct 05 '23 at 06:49
-
2The distribution of estimates and predictions is Gaussian. But for the computation of confidence intervals or prediction intervals we use a t-distribution. The question asks for the former. – Sextus Empiricus Oct 05 '23 at 09:09