In linear regression, does the formula for error contain the marginal expectation or conditional expectation?

Question

In linear regression, let $\epsilon_i$ be the $i$th error term. Is the formula for $\epsilon_i$

$\epsilon_i = Y_i - E(Y_i)$

or

$\epsilon_i = Y_i - E(Y_i | X_i = x_i)$?

I have seen both definitions.

Penn State University uses Definition #1[1].

Iain Pardoe uses Definition #2 in Section 2.1 in his 3rd edition of "Applied Regression Modeling".

enter image description here

Which one is correct?

[1]: Link to PennState

In a different PennState course, they use definition 2 (https://online.stat.psu.edu/stat508/lesson/3/3.4). — wzbillings, Mar 22 '24 at 03:25

score 2 · Accepted Answer · answered Mar 22 '24 at 04:57

Both notations are fine, just depending whether you are thinking of a joint probability or not.

If you are thinking about the joint probability of $X_i$ and $Y_i$, then technically $E[Y_i] = E[X_i] \beta$, and so you would want to use the notation $E[Y_i | X_i = x_i] = x_i \beta$.

However, typically in linear regression we do not think $X_i$ as random but rather has a fixed value. If $X_i$ is a fixed, then $E[Y_i] = X_i\beta$, so that's correct in the case that we don't consider $X_i$ to be random.

Thank you. I am NOT using random covariates, so E(Y_i) is what I mean. — Iterator516, Mar 22 '24 at 13:17

score 2 · Answer 2 · answered Mar 22 '24 at 05:28

This has been discussed time and again here and as noted in Cliff AB's answer, there are two aspects about the assumptions of regressors, which can be seen in the case of simple linear regression model:

$\bullet $ When the independent variable is controlled by the experimenter and hence it is nonstochastic:

$$\begin{align}\mathbb E[Y]&= \beta_0+\beta_1X,\\\mathbb V[Y]&=\sigma^2.\end{align}$$

$\bullet$ When both the dependent and independent variables are stochastic that is $X$ and $Y$ are jointly distributed:

If the joint distribution is bivariate normal, then

$$\begin{align}\mathbb E[Y\mid X=x]&=\beta_0+\beta_1x,\\\mathbb V[Y\mid X]&=\sigma^2_{y\cdot x}\\&=\sigma^2_y(1-\rho^2).\end{align}$$

Reference:

$\rm[I]$ Linear Models and Generalizations: Least Squares and Alternatives, C. R. Rao, H. Toutenburg, Shalabh, Christian Heumann, Springer-Verlag, $2008, $ sec. $2.1, 2.15.$

In linear regression, does the formula for error contain the marginal expectation or conditional expectation?

2 Answers2

Reference:

Linked