Expected value of the residuals

Question

How would one prove that the expected value of the residuals from OLS regression is zero? I will make two cases. In the first case I treat $X_i$ as random and in the second case I treat it is non-random.

First case. We know that $\hat{u}_i = y_i - \hat{y}_i$. Taking the expectation, $E[\hat{u}_i] = E[y_i] - E[\hat{y}_i]$. Now, we know from the solution of the OLS minimisation problem $\bar{y}_i = \bar{\hat{y}}_i$ because $\bar{\hat{u}} = 0$. If we take the probability limits, $plim \: \bar{y}_i = plim \: \bar{\hat{y}}_i$. By the law of large numbers this leads to $E[y_i] = E[\hat{y}_i]$. Hence, $E[\hat{u}_i] = 0$. Is this proof correct? Besides, how would one interpret $E[\hat{u}_i]$? $\hat{u}_i$ results from a given sample. Expectation is a population concept. If we take the expectation of a residual, what would this represent? The sample means of a residual term in the long run or population?

Second case. This is easy. $E[\hat{u}] = E[My] = E[M{u}] = ME[{u}] = 0$ because $My = MXB+Mu$ and $MX = 0$, because $X$ is non-random and hence can be taken out of the expectation operator, and because $E[u] = 0$. Here $M = I - P$ projection matrix. But my question is not about this case where $X$ is non-random, but the first case above where it is random.

I'm not so sure the expectations of the residuals are always zero: it ought to depend on the model. Are you talking about a standard linear multiple regression problem? As far as expectation goes, it is also a concept about random variables, and that evidently is the sense in which it is intended here. — whuber, Oct 01 '18 at 19:31
It is the standard linear regression model. The residual is a random variable right? It depends on the random variables $y_i$ and $x_i$. It differs from one sample to another, so there is a sampling distribution for the residuals. And in fact, the interpretation of the expected value of the residuals is a question I posed. — Snoopy, Oct 01 '18 at 19:47
In the standard model, everything is conditional on the $x_i,$ so they are not considered random variables for these expectations. — whuber, Oct 01 '18 at 20:59
I do not understand this. $y_i$ is a random variable. Hence the residuals are random variables. — Snoopy, Oct 01 '18 at 21:32
You have already given a good demonstration in your first comment! The Normal equations for OLS regression show how you can explicitly express the residuals as linear combinations of the $y_i.$ The coefficients are functions of the $x_i,$ but since you treat those as constants when conditioning on the $x_i$ you don't even care what functions those happen to be. A linear combination of random variables is a random variable. — whuber, Oct 01 '18 at 21:37
Using vector notation $y=(y_1,\ldots,y_n)$ and $\hat u=(\hat u_1,\ldots, \hat u_n),$ denoting the coefficient estimates (according to the Normal equations) $$\hat\beta = (X^\prime X)^{-} X^\prime y,$$ and writing $\mathbb{I}_n$ for the $n\times n$ identity matrix, by doing nothing more than plugging everything in we obtain $$\hat u = y - \hat y = y-X\hat\beta = [\mathbb{I}_n - X (X^\prime X)^{-} X^\prime] y,$$ which explicitly is a linear transformation of $y.$ See https://stats.stackexchange.com/search?q=%22hat+matrix%22+residual+score%3A1 for additional information. — whuber, Oct 01 '18 at 22:05
What does all this have to do with my question? I know what $\hat{u}$ is equal to. — Snoopy, Oct 01 '18 at 22:26
Are you sure you are not interested in computing the conditional expectation of the model residual given the predictor variables? — Isabella Ghement, Oct 02 '18 at 00:28
@Isabella: yes I treat the predictor as random, which was not explicit but now I made is explicit in my last edit of the question. — Snoopy, Oct 02 '18 at 09:35
@Matthew. Thanks for the comment. The stated sample mean is an approximation to the stated population mean. So this is a method of moments case. But why is this related to my question? — Snoopy, Oct 02 '18 at 20:47
@jbowman. You mean $\frac{1}{n}\sum_{i}E[\hat{u}_i] = E[0]$ and $nE[\hat{u}_i] = 0$? Fine. But I still want to know if my proof is correct too. Expectation is a population concept, so I want to use probability limits in my proof where I also do not consider conditioning on $x_i$. And I am seeking an interpretation to that $E[\hat{u}_i] = 0$. Expectation is a population concept and I would like to know what the stated expectation means. Is it that in repeated sampling, or sampling in the long run, the means of a residual term is 0? — Snoopy, Oct 02 '18 at 20:56
@Snoopy; linear regression output is splittable in: explained part + unexplained part. Residuals are simply the unexplained part, them have zero expectation by construction. Maybe my explanation can help you. — markowitz, Dec 04 '20 at 14:35

Ben · Accepted Answer · 2023-12-11T19:16:20.420

7

Using OLS estimation, the residuals can be written using the hat matrix $\mathbf{h} = \boldsymbol{X} (\boldsymbol{X}^\text{T} \boldsymbol{X})^{-1} \boldsymbol{X}^\text{T}$ as follows:

$$\begin{equation} \begin{aligned} \mathbf{r} &= (\mathbf{I}-\mathbf{h}) \boldsymbol{Y} \\[6pt] &= (\mathbf{I}-\mathbf{h}) (\boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}) \\[6pt] &= (\mathbf{I}-\mathbf{h}) \boldsymbol{X} \boldsymbol{\beta} + (\mathbf{I}-\mathbf{h}) \boldsymbol{\varepsilon} \\[6pt] &= \mathbb{0} + (\mathbf{I}-\mathbf{h}) \boldsymbol{\varepsilon} \\[6pt] &= (\mathbf{I}-\mathbf{h}) \boldsymbol{\varepsilon}. \\[6pt] \end{aligned} \end{equation}$$

So, assuming the error terms have zero mean conditional on the explanatory variables (a standard assumption in regression analysis), you have:

$$\mathbb{E}(\mathbf{r}|\boldsymbol{X}) = \mathbb{E}((\mathbf{I}-\mathbf{h}) \boldsymbol{\varepsilon}|\boldsymbol{X}) = (\mathbf{I}-\mathbf{h}) \mathbb{E}(\boldsymbol{\varepsilon}|\boldsymbol{X}) = \mathbf{0}.$$

edited Dec 11 '23 at 19:16

answered Apr 07 '19 at 02:23

Ben

124,856

And by the law of iterated expectations the unconditional expected value $\textbf{r}$ is also zero, which would complete the proof? – Snoopy Apr 08 '19 at 09:16
1

If you like, but that is a weaker condition than the conditional expectation being zero. – Ben Apr 08 '19 at 09:50
@Ben; I argued in my answer that assumption about error term is not needed. Let me known what you think. – markowitz Dec 03 '20 at 14:00
1

@markowitz: Your answer shows that the residuals sum to zero (by construction), which is certainly an interesting property of OLS, but it is not the same as the residual vector having zero expected value. – Ben Dec 03 '20 at 21:14
But if the sum is constrained to be zero the mean is too. So, $E[0]=0$. I skipped this trvial passage but the core message is explicitly given, the expected value of residuals is zero by construction. Assumpion about errors is not necessary, while from your explanation it seems. – markowitz Dec 03 '20 at 21:54
1

You appear to be confusing the sample mean (which is a statistic) with the expected value of the random vector. These are two different kinds of "mean". – Ben Dec 03 '20 at 22:14
I don't understand what you mean and, more important, the implications about the question. We can approach the problem with matrix form or not, however residuals are random variables with zero expected value. This hold by construction, regardless the reliability of exogeneity assumpion. I think that modify your answers is a good idea. However if you are convinced that it is ok, feel free to keep it unchanged. – markowitz Dec 03 '20 at 22:50

markowitz · Answer 2 · 2020-12-04T13:08:29.583

How would one prove that the expected value of the residuals from OLS regression is zero?

In linear regression framework many problems can emerge from the so called error term. However you here speak unambiguously about residuals in OLS context.

Then, the expected value of residuals is zero by construction. Algebra demand it; the origin come from the first order conditions of optimization for OLS parameters. Usual assumptions about error term have no role.

Following Ben's notation we can write

$$\begin{equation} \begin{aligned} \mathbf{1}' \mathbf{r} &= \mathbf{1}'(\hat{\mathbf{Y}} -\mathbf{Y}) = \mathbf{1}' \hat{\mathbf{Y}} - \mathbf{1}' \mathbf{Y} = 0 \end{aligned} \end{equation}$$

Therefore not only the expected value is zero but the sum of residuals is precisely zero too, always. ($\mathbf{1}$ is a vector of 1)

The problem of the Ben's explanation is in the second row

$$\begin{equation} \begin{aligned} \mathbf{r} &= (\mathbf{I}-\mathbf{h}) \boldsymbol{Y} \\[6pt] &= (\mathbf{I}-\mathbf{h}) (\boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}) \\[6pt] \end{aligned} \end{equation}$$

from the decomposition of $\boldsymbol{Y}$ the assumption about the error term $\mathbb{E}(\boldsymbol{\varepsilon}|\boldsymbol{X}) = \mathbf{0}$ seems needed, but this is not. Important to note that hat matrix ($\mathbf{h}$) should be used on OLS parameters: $\mathbf{h} \boldsymbol{Y}= \boldsymbol{X}'\boldsymbol{b} = \hat{\mathbf{Y}}$

Finally we can verify that by construction: $\mathbb{E} [\boldsymbol{Y} | \boldsymbol{X} ]= \boldsymbol{X}'\boldsymbol{b} = \hat{\mathbf{Y}}$

and $\boldsymbol{Y} = \hat{\mathbf{Y}} + \mathbf{r}$

therefore $\mathbb{E} [\mathbf{r} | \boldsymbol{X} ]= \mathbb{E} [\boldsymbol{Y} | \boldsymbol{X}] - \mathbb{E} [\hat{\mathbf{Y}} | \boldsymbol{X}] = \boldsymbol{0}$

by construction too

As sidenote:

Expectation is a population concept. If we take the expectation of a residual, what would this represent? The sample means of a residual term in the long run or population?

Expectation is a general concept, you can refer it even at one observation only. For example even at just one coin tossing. The proper application depend on the context and the question.

Your context is linear regression estimated with OLS, therein the residuals are a well defined object. Important to note that residuals are an estimation quantity, you compute them. Different thing are errors, them are outside the control of researcher, them are unobservables, for this reason you have to make assumption about.

Something like "population residuals" is an ambiguous object. Residuals are all in your hands, always. You can think about these in a scheme where the amount of observations go to infinity or cover all the population. But nothing change in the above algebra and implications; them depend from the so called Geometry of OLS.

Said that, residuals remain interpretable as random variables and you can compute their expectation. Without loss of generality you can think about expectation of residuals (or estimators, ecc) as conditional of regressors ($\mathbb{E}(\mathbf{r}|\boldsymbol{X})$), so no matters if them are stocastic or not.

score 0 · Answer 3 · answered Jul 24 '23 at 16:45

In the setting of and with the same notation as in this answer, let $\hat u_i = y_i - \hat y_i$. Then the $\hat u_i$ always sum to $0$. Indeed, by Lemma 1a and 2 of the linked answer, the hat matrix $\mathbf H$ is symmetric and has the vector $(1,1,...,1)^\top\in\mathbb R^n$ as eigenvector with eigenvalue $1$, so \begin{equation*} \sum_{i=1}^n \hat y_i = \langle (1,\dots,1)^\top, \mathbf H\, \mathbf y\rangle = \langle \mathbf H\, (1,\dots,1)^\top, \mathbf y\rangle = \langle(1,\dots,1)^\top, \mathbf y\rangle = \sum_{i=1}^n y_i. \end{equation*} It follows directly that $$\sum_{i=1}^n \hat u_i =0.$$

Note that this result is only true for linear regression with intercept. For linear regression without intercept, $(1,\dots, 1)^\top$ need not be an eigenvector of $\mathbf H$ with eigenvalue $1$, so the previous argument may fail.

Expected value of the residuals

3 Answers3

Linked