1

Consider the linear model $$ Y={\underbrace{X_i}_{K\times 1 }}^\top\beta+U_i $$ and assume

(0) There is no intercept in the model

(1) $E(X_i U_i)=0_K$ [orthogonality]

(2) $E(X_i X_i^\top)$ has rank $K$

(3) We have an i.i.d. sample $\{Y_i, X_i\}_{i=1}^n$

Then, the OLS estimator $$ \hat{\beta}=({\underbrace{X}_{n\times K}}^\top X)^{-1} X^\top \underbrace{Y}_{n\times 1} $$ is consistent.

The sketch of the proof is: by Law of large numbers and continuous mapping theorem, we have $plim_{n\rightarrow \infty} \frac{1}{n}(X^\top X)^{-1}=E(X_i X_i^\top)^{-1}$ and $plim_{n\rightarrow \infty}\frac{1}{n} X^\top \underbrace{(X^\top \beta+U)}_Y =E(X_i X_i^\top)\beta+ E(X_i U_i) $. By combining the two expressions, we have $$ plim_{n\rightarrow \infty} \hat{\beta}=E(X_i X_i^\top)^{-1}E(X_i X_i^\top)\beta + E(X_i X_i^\top)^{-1}\underbrace{E(X_i U_i)}_0= \beta $$

Question:

(a) Observe that if $E( U_i) =0$, then (1) is equivalent to $cov(X_i, U_i)=0$. Hence, orthogonality is equal to zero covariance. However, if $E( U_i) \neq 0$, then $cov(X_i, U_i) $ may be different from zero. Hence, orthogonality is not equal to zero covariance in this second setting. Still, the consistency proof goes through. Hence, orthogonality is sufficient for consistency and does not require $E( U_i) =0$. Is this correct?

(b) Let's think about the reverse: is orthogonality necessary for consistency? That is, suppose $E(X_i U_i)\neq 0$ but $cov(X_i, U_i)=0$ because $E(U_i)=0$. Is $\hat{\beta}$ inconsistent? Can you show it in the multidimensional case ($K>1)$?

Note: I have read several questions on orthogonality versus zero covariance (for example, here), but they have not cleared my doubt as they look to generic.


Matlab simulation which shows consistency with no intercept, $E(U_i)\neq 0$, $E(X_i)\neq 0$, $cov(X_i, U_i)\neq 0$, $E(X_i U_i)=0$.

clear 
rng default

J=10^4; beta_OLS_temp=zeros(J,1); r=10^7; beta=2.5; k=-4/5; h=2;

for j=1:J

X=unifrnd(-1,2,r,1);

U=k*(X.^2)+h;

Y=X*beta+U;

beta_OLS_temp(j,:)=(X.'X)^(-1)(X.'*Y); end

beta_OLS=sum(beta_OLS_temp(:,1))/J;

Star
  • 826
  • Hi: If $E(U_{i}) \neq 0$, then the model has an error term with a non-zero mean. Intuitively, that would make the $\beta$ estimate not consistent and your equations bear that intuition out. – mlofton Jul 28 '22 at 19:31
  • Why? Please can you show it formally? I never use $E(U_i)=0$ in my proof. This is indeed the point I make in my question! – Star Jul 28 '22 at 19:35
  • Hi TEX: You use $E(X_{i}U_{i}) = 0$ in your proof and I assume that that is based on the orthogonality assumption which is based on $E(U_i) = 0$. Is it not ? If it is, then, if you don't have $E(U_i) = 0$, then you won't have orthogonality so that last term in your last equation won't be 0 ? – mlofton Jul 29 '22 at 03:38
  • Why is the orthogonality assumption based on $E(U_i)=0$? Could you explain? It does not seem so if I read Hayashi p.112 https://www.google.com/url?sa=t&source=web&rct=j&url=https://docs.google.com/viewer%3Fa%3Dv%26pid%3Dsites%26srcid%3DZGVmYXVsdGRvbWFpbnxlY29ub21ldHJpY3NpdGFtfGd4OjYyMTU3YjczNWIwZTRkZjI&ved=2ahUKEwiz5dbhn535AhWBiFwKHYVNC3wQFnoECAoQAQ&usg=AOvVaw00r65D3VNuPApiZO7Ci2_C – Star Jul 29 '22 at 04:18
  • Also read Hayashi p.109 – Star Jul 29 '22 at 04:23
  • Okay. I'll check those pages out when I have time but pages 7-9 of Hayashi seem to clarify it without the use of asymptotics. Note that $E(X U) = E(X E(U | X))$. So, the conditional mean of U given X needs to be zero in order for orthogonality to hold. Hayashi calls this the exogeneity assumption. But I'll check out 109-112 ( hopefully today ). Thanks. – mlofton Jul 29 '22 at 16:22
  • I took a look and he's making assumptions about the error term that are related to martingale difference sequences. In order to possibly say anything useful, I'll need to read the whole chapter carefully. Hopefully that will happen over the next week. In the interim, hopefully someone else can say something about it. Just to re-iterate, in the standard OLS case, if a non-zero mean is assumed for the error term and there is no intercept, then $\hat{\beta}$ not consistent. If there is an intercept, $\hat{\beta}$ is consistent because the intercept soaks up the non-zero mean of the error term. – mlofton Jul 30 '22 at 02:09
  • Note that, in the above, by standard OLS, I mean the case where the error term is assumed to be normally distributed and no assumptions are made about MDS. – mlofton Jul 30 '22 at 02:11

1 Answers1

-1

Orthogonality is indeed, as the derivation demonstrates, sufficient for consistency.

If $X_i$ does not contain a constant, orthogonality however no longer automatically implies that $E(U_i)=0$ (see the comments below for details).

If we then do have $E(U_i)\neq0$, then whether the OLSE is consistent will depend on other features of the DGP. The OP offers an example where consistency obtains.

On the other hand, take, for example, $X_i$ and $U_i$ to be independent. Then, $$E(X_iU_i)=E(X_i)E(U_i)\neq0,$$ which would imply inconsistency when $E(U_i)\neq0$ - unless the DGP is such that $E(X_i)=0$, which would restore orthogonality.

More specifically, by the OP's, derivation, OLS will, in the case of a simple regression, plim to $$ \beta+\frac{E(X_i)E(U_i)}{E(X_i^2)} $$ Here is a numerical illustration. Note that for $X_i\sim N(\mu,1)$, $E(X_i^2)=1+\mu^2$ from https://en.wikipedia.org/wiki/Noncentral_chi-squared_distribution.

Hence, the above plim becomes $$ \beta+\frac{\mu E(U_i)}{1+\mu^2} $$

n <- 5000

ols.wocst.nonzeromean <- function(n){ x <- rnorm(n, 3) u <- rnorm(n, 2) y <- 4*x+u coef(lm(y~x-1)) }

mc <- replicate(10000, ols.wocst.nonzeromean(n)) summary(mc) # 4+2*3/(1+9)=4.6

> Min. 1st Qu. Median Mean 3rd Qu. Max. 4.582 4.597 4.600 4.600 4.604 4.623

  • Thanks. Can you conclude your answer by addressing my (a) and (b) questions? Thanks – Star Jul 30 '22 at 15:11
  • My above answer indeed only adresses a), arguing that a nonzero-mean error term and a regression without constant generally do not produce consistency, as it will violate $E(X_iU_i)=0$. As to b), your derivation shows that consistency obtains only if $E(X_iU_i)=0$. There does not seem to be anything special to show in the multidimensional case. – Christoph Hanck Jul 30 '22 at 15:32
  • It is just that when $X_i$ does contain a constant (say, $X_i=(1, D_i)$), then (we are then in a multidimensional case!) $E(X_iU_i)=0$ also implies $E(1U_i)=E(U_i)=0$. Also, $E(X_iU_i)=0$ then implies $E(D_iU_i)=0$ and then also $Cov(D_i,U_i)=E(D_iU_i)-E(D_i)E(U_i)=0-0E(D_i)=0$. – Christoph Hanck Jul 30 '22 at 15:32
  • I am not sure the example in your answer makes sense as you are simulating a DGP where $E(U_i X_i)$ is different from zero. You are just showing that if $E(U_i X_i)\neq 0$ we have inconsistency. I don't understand what you want to prove. – Star Jul 30 '22 at 15:56
  • Maybe I misunderstand your question. My point indeed precisely is that it is not clear how to get $E(X_iU_i)=0$ when $E(U_i)\neq0$. So clearly, orthogonality is sufficient for consistency, but my example shows that such orthogonality does not generally obtain for $E(U_i)\neq0$, so that your claim that we do not need $E(U_i)=0$ when there is no constant requires care. – Christoph Hanck Jul 30 '22 at 16:13
  • But you are right that edit was necessary – Christoph Hanck Jul 30 '22 at 16:25
  • I have added to my question a Matlab simulation where $E(X_iU_i)=0$, $E(U_i)\neq 0$, $cov(X_i, U_i)\neq 0$. Yet, we have consistency. – Star Jul 30 '22 at 16:30
  • I did surely not mean to claim that inconsistency then is impossible (see also my point on $E(X_i)=0$), just that orthogonality then no longer implies that $E(U_i)=0$, and when the latter is not the case, whether or not we have consistency depends on other features of the DGP - which is also what you illustrate, afaics (I do not read MATLAB too well unfortunately - maybe you can write down the DGP?). I am sorry if that was not your question. I further added my answer to clarify in this regard. – Christoph Hanck Jul 30 '22 at 18:18
  • Christoph: TEX is interested in the case when $E(\epsilon_{i}|X)$ doesn't equal zero. As you showed, except for very special cases, it won't be the case that $\beta$ is consistent if $E(\epsilon | X) = 0$ and there is no intercept. OTOH, Hayashi shows ( page 7) that, in the case, where there is an intercept, it will soak up the non-zero mean of the error term ( basically transform it to zero ) so , in that case, one does obtains consistency. But I agree with you that, theoretically speaking, a non-zero mean of the error term, ( and no intercept ) generally implies a lack of consistency. – mlofton Jul 30 '22 at 18:52
  • Note that, later on, in chapter 2, Hayashi covers the case where an MDS is assumed. So, it's no longer the straightforward case that just assumes that $\epsilon$ is normally distributed. I still have to read that chapter to understand what that means but it's always good to have other people's input also. TEX: Another good book ( for MDS ) that I unfortunately have in storage and not at my finget tips is White's "Asymptotic Theory For Econometricians". That will say something about the MDS cases. It's a terse but clear and good book. – mlofton Jul 30 '22 at 18:59