4

How would one prove that the expected value of the residuals from OLS regression is zero? I will make two cases. In the first case I treat $X_i$ as random and in the second case I treat it is non-random.

First case. We know that $\hat{u}_i = y_i - \hat{y}_i$. Taking the expectation, $E[\hat{u}_i] = E[y_i] - E[\hat{y}_i]$. Now, we know from the solution of the OLS minimisation problem $\bar{y}_i = \bar{\hat{y}}_i$ because $\bar{\hat{u}} = 0$. If we take the probability limits, $plim \: \bar{y}_i = plim \: \bar{\hat{y}}_i$. By the law of large numbers this leads to $E[y_i] = E[\hat{y}_i]$. Hence, $E[\hat{u}_i] = 0$. Is this proof correct? Besides, how would one interpret $E[\hat{u}_i]$? $\hat{u}_i$ results from a given sample. Expectation is a population concept. If we take the expectation of a residual, what would this represent? The sample means of a residual term in the long run or population?

Second case. This is easy. $E[\hat{u}] = E[My] = E[M{u}] = ME[{u}] = 0$ because $My = MXB+Mu$ and $MX = 0$, because $X$ is non-random and hence can be taken out of the expectation operator, and because $E[u] = 0$. Here $M = I - P$ projection matrix. But my question is not about this case where $X$ is non-random, but the first case above where it is random.

Snoopy
  • 523
  • 2
    I'm not so sure the expectations of the residuals are always zero: it ought to depend on the model. Are you talking about a standard linear multiple regression problem? As far as expectation goes, it is also a concept about random variables, and that evidently is the sense in which it is intended here. – whuber Oct 01 '18 at 19:31
  • It is the standard linear regression model. The residual is a random variable right? It depends on the random variables $y_i$ and $x_i$. It differs from one sample to another, so there is a sampling distribution for the residuals. And in fact, the interpretation of the expected value of the residuals is a question I posed. – Snoopy Oct 01 '18 at 19:47
  • In the standard model, everything is conditional on the $x_i,$ so they are not considered random variables for these expectations. – whuber Oct 01 '18 at 20:59
  • I do not understand this. $y_i$ is a random variable. Hence the residuals are random variables. – Snoopy Oct 01 '18 at 21:32
  • You have already given a good demonstration in your first comment! The Normal equations for OLS regression show how you can explicitly express the residuals as linear combinations of the $y_i.$ The coefficients are functions of the $x_i,$ but since you treat those as constants when conditioning on the $x_i$ you don't even care what functions those happen to be. A linear combination of random variables is a random variable. – whuber Oct 01 '18 at 21:37
  • 3
    Using vector notation $y=(y_1,\ldots,y_n)$ and $\hat u=(\hat u_1,\ldots, \hat u_n),$ denoting the coefficient estimates (according to the Normal equations) $$\hat\beta = (X^\prime X)^{-} X^\prime y,$$ and writing $\mathbb{I}_n$ for the $n\times n$ identity matrix, by doing nothing more than plugging everything in we obtain $$\hat u = y - \hat y = y-X\hat\beta = [\mathbb{I}_n - X (X^\prime X)^{-} X^\prime] y,$$ which explicitly is a linear transformation of $y.$ See https://stats.stackexchange.com/search?q=%22hat+matrix%22+residual+score%3A1 for additional information. – whuber Oct 01 '18 at 22:05
  • What does all this have to do with my question? I know what $\hat{u}$ is equal to. – Snoopy Oct 01 '18 at 22:26
  • Are you sure you are not interested in computing the conditional expectation of the model residual given the predictor variables? – Isabella Ghement Oct 02 '18 at 00:28
  • 2
  • The mean of the residuals from OLS regression is equal to zero by construction if there is an intercept term. Given this, the expected value is zero as well - no further proof needed. 2) Why are you finding probability limits and using the Law of Large Numbers? Those are asymptotic effects, and will tell you nothing about finite-sample expected values...
  • – jbowman Oct 02 '18 at 01:21
  • @Maathew: I am not an expert but can handle it. – Snoopy Oct 02 '18 at 09:35
  • 1
    @Isabella: yes I treat the predictor as random, which was not explicit but now I made is explicit in my last edit of the question. – Snoopy Oct 02 '18 at 09:35
  • @Matthew. Thanks for the comment. The stated sample mean is an approximation to the stated population mean. So this is a method of moments case. But why is this related to my question? – Snoopy Oct 02 '18 at 20:47
  • @jbowman. You mean $\frac{1}{n}\sum_{i}E[\hat{u}_i] = E[0]$ and $nE[\hat{u}_i] = 0$? Fine. But I still want to know if my proof is correct too. Expectation is a population concept, so I want to use probability limits in my proof where I also do not consider conditioning on $x_i$. And I am seeking an interpretation to that $E[\hat{u}_i] = 0$. Expectation is a population concept and I would like to know what the stated expectation means. Is it that in repeated sampling, or sampling in the long run, the means of a residual term is 0? – Snoopy Oct 02 '18 at 20:56
  • @Snoopy; linear regression output is splittable in: explained part + unexplained part. Residuals are simply the unexplained part, them have zero expectation by construction. Maybe my explanation can help you. – markowitz Dec 04 '20 at 14:35