Why do we need to determine the distribution of $\textbf{Y}$ in the multiple linear regression problem?

Question

Once again, here I am. Given the multiple linear regression model \begin{align*} \textbf{Y} = \textbf{X}\beta + \epsilon \end{align*}

where $\epsilon\sim\mathcal{N}(\textbf{0},\sigma^{2}\textbf{I})$ and $\mu = \textbf{X}\beta$, why do we need to determine the distribution of $\textbf{Y}$? If we apply the least square method to obtain $\hat{\beta}$, we get the explicit relation \begin{align*} Y_{i} = \hat{\beta}_{0} + \hat{\beta}_{1}x_{i1} + \ldots + \hat{\beta}_{p-1}x_{i,p-1} + \epsilon_{i} \end{align*}

from whence we are able to obtain the value of the response variable $Y$ in terms of the explanatory variables. My second question is: how do we interpret each component of $\textbf{Y} = (Y_{1},Y_{2},\ldots,Y_{n})$? Does each $Y_{i}$ represent the outcome from a different sample? Otherwise, if they belong to the same sample, why do they have different means?

score 1 · Accepted Answer · answered May 16 '19 at 13:55

1

Linear regression makes no assumptions on the distribution of the marginal outcome (That is, $\bf{Y}$). However, there is an assumption on the distribution of the elements of $\bf{Y}$.

Each element of $\bf{Y}$ should have a normal distribution. Given the covariates, the distribution should be

$$y \vert \beta,x \sim \mathcal{N}(x^T\beta, \sigma)$$

To your specific questions:

why do we need to determine the distribution of ?

We don't. We make assumptions about the conditional distribution of $\bf{Y}$, not the marginal.

how do we interpret each component of =(1,2,…,)?

As a draw from $$y \vert \beta,x \sim \mathcal{N}(x^T\beta, \sigma)$$

Does each represent the outcome from a different sample?

Yes

answered May 16 '19 at 13:55

Demetri Pananos

36,121

The problem which concerns me is the distribution $\textbf{Y}\sim\mathcal{N}(\textbf{X}\beta,\sigma^{2}\textbf{I})$. How do we interpret it? According to such distribution, each component $Y_{i}$ could have a different mean. – May 16 '19 at 14:11
@user1337 That is correct. Each element of the response vector should come from a normal with distinct mean, namely $x^T\beta$. – Demetri Pananos May 16 '19 at 14:16
Sorry for insisting, but I still did not understand why they have different means. In simple linear regression, we colect the pairs of data $(x_{i},y_{i})$ and estimate the parameter $\beta$. In this case, all the information is collected from a single sample, which has an unique sample mean. When we extend the study to the multiple linear regression case, why do we get different means? Don't the data come from the same sample? – May 16 '19 at 14:20
2

I think you are hung up on the difference between conditional and marginal means. Focus on a single element of $\bf{Y}$. It has associated covariates $x$. I were to sample the phenomenon again and again, linear regression assumes that those samples would be normally distributed with mean $x^T\beta$. Linear regression doesn't care about the distribution or sample mean of $\bf{Y}$ – Demetri Pananos May 16 '19 at 14:26
I think I get it now. If we are given $n$ observations $Y_{i}$ and the model matrix $\textbf{X}$, we determine $\hat{\beta}$, where $x^{T}\hat{\beta}$ is an estimator for the mean $\mu$. Such procedure results into the distribution $\mathcal{N}(x^{T}\hat{\beta},\sigma^{2})$ for $y$. Am I on the right track? Anyway, there still remains one question: does it have any purpose on obtaining the distribution of $\textbf{Y}$? – May 16 '19 at 14:38
@user1337 Yes, you're on the right track. To your other question, the distribution of $\bf{Y}$ (that is, the distribution of the elements in that vector) is of no consequence. I can easily construct an example where the distribution is strongly bimodal and linear regression produces an almost perfect fit. – Demetri Pananos May 16 '19 at 14:45
"
Linear regression makes no assumptions on the distribution of the marginal outcome. However, there is an assumption on the distribution of the elements of Y. " This is backwards - the variable Y can have any unspecified distribution, but the marginal distribution (conditional on XB, i.e. the distribution of the error term) must be Gaussian. You correctly represent it later in your post.
– Joey F. May 16 '19 at 15:32
@JoeyF., but the marginal distribution (conditional on XB, i.e. the distribution of the error term) sounds a little confusing to me. Actually, there is is the conditional distribution $Y|X$ and there is the (unconditional) distribution of $\varepsilon$. These are different because they have different means, but the functional form and the variance are the same. – Richard Hardy May 16 '19 at 18:23
After some thought, I came into the following problem: which $x^{T}$ do we choose to estimate the mean $\mu = x^{T}\hat{\beta}$? – May 16 '19 at 22:53
@RichardHardy Yes, you are correct - my notation was sloppy. It is not exactly the the same distribution, but rather shifted from $\mu=0$ (distribution of $\epsilon$) to $\mu=X\beta$ (marginal distribution of $Y | X\beta$. It is "equal" in terms of variance and type of distribution (i.e. both are Gaussian). Good clarification. – Joey F. May 16 '19 at 23:13
@user1337 your question doesn't make much sense. The correct interpretation would be, "for any given $X$, the corresponding distribution of $Y$ has mean $\mu = X\beta$". – Joey F. May 16 '19 at 23:16

Why do we need to determine the distribution of $\textbf{Y}$ in the multiple linear regression problem?

1 Answers1

Linked