3

If I have a linear model $$ Y = X_1 \beta_1 + X_2\beta_2 + e$$ where $X_1$ is endogenous to $e$ but $X_2$ is not, then simply performing OLS will yield an unbiased estimate for $\beta_2$ but not $\beta_1$.

As a whole, this seems intuitive, but I am having trouble proving that the estimator is indeed unbiased for the $\beta_2$ component. Even with an independence assumption I am unable to proceed.

Some code for simnulation - I call $U$ the error, $V$ the exogenous variable whose regression coefficient is estimated correctly and $Z$ the endogenous variable

covMat <- matrix(data = c(2, 0, 1.9, 0, 2, -1.5, 1.9, -1.5, 3), nrow=3, ncol=3)

gen_vars <- function(covMat, k=3){ varRaw <- rnorm(n=k) varTrue <- chol(covMat) %*% varRaw return(t(varTrue)) }

k <- dim(covMat)[1] n <- 10000 varMat <- numeric() for(i in (1:n)){ var <- gen_vars(covMat, k=k) varMat <- rbind(varMat, var) }

U <- varMat[,1] V <- varMat[,2] Z <- varMat[,3]

betaV <- 1.6 betaZ <- -3

Y <- betaV * V + betaZ * Z + U

X <- varMat[,2:3]

print(solve(t(X) %% X) %% t(X) %*% Y)

  • 1
    "(Not) Endogenous to $e$" is to mean $Cov(X_1,e)\neq0$ but $Cov(X_2,e)=0$? And "simply performing OLS" means regressing $y$ on both $X_1$ and $X_2$? – Christoph Hanck Oct 09 '23 at 14:26
  • Yep! and in fact we may even assume $Cov (X_1, e) \neq 0$ but $X_2 \perp e$, if that helps. For "simply performing OLS" I mean if we concatenate $X = (X_1 ; X_2)$, and considered $\hat{\beta} = (X^\top X)^{-1} X^\top Y$; the first component would be a biased estimate of $\beta_1$ but its second term would be an unbiased estimate of $\beta_2$. I've confirmed this with many simulations and found some "confirmations" on this site, but no proof. :/ – Tommy Tang Oct 09 '23 at 18:06
  • It would be interesting to see your simulation to find out where ours differ! – Christoph Hanck Oct 10 '23 at 08:31
  • Yes! I'd love any feedback. I am editing my question to include the code for simulation. In fact if there are additional assumptions my simulation inadvertently uses, I would love to know and consider if they are suitable for my purposes. This should have been a small portion of my research and it has given me a headache for a week now! – Tommy Tang Oct 11 '23 at 00:47
  • Thanks. Unlikely to be the issue but I cannot run your code as mu is missing. Also, ones seems not to be used (again unlikely to be relevant). – Christoph Hanck Oct 11 '23 at 04:04
  • One conjecture at this point: your regression does not include a constant but maybe mu generates nonzero means, so that this creates additional bias? – Christoph Hanck Oct 11 '23 at 04:09
  • Oh sorry, I deleted those now (I was deleting parts of the code to reduce character count). If I include a nonzero "mu", it is biased unless I include a ones column and also estimate a "beta_0" for the intercept term. This is also biased but for betaV it is still unbiased – Tommy Tang Oct 11 '23 at 15:50
  • Have you tried playing around with the numbers? Maybe the correlations and coefficient values are such that biases somehow offset each other. – Christoph Hanck Oct 11 '23 at 19:16
  • Do you mean playing with the covMat values? I've tried a variety of values, including making some negative or all positive... I am always able to estimate beta_V unbiasedly. It's just a little mind-blowing to me that this is the case, but not so for beta_Z, so even if I'm wrong ultimately, there is something funky going on that is worth noting...

    I wonder if perhaps the cholesky decomposition function in R does something? That seems to be the only difference between our methods. Even if this is what is causing it, this is okay for my purposes and I just want to figure out why its happening

    – Tommy Tang Oct 11 '23 at 19:36
  • 1
    Indeed, that would also be the only place where I could plausibly look next. And indeed, when I compute cov(varMat) that does seem rather far away from covMat, which, at the only quick glance I can take right now, seems "wrong", no? – Christoph Hanck Oct 12 '23 at 03:52
  • 1
    Please try varTrue <- t(chol(covMat)) %*% varRaw – Christoph Hanck Oct 12 '23 at 05:37
  • whoa, yeah, no longer unbiased. Guess that was my mistake. Any idea why not taking the transpose allowed unbiased estimates? A little much for a coincidence – Tommy Tang Oct 12 '23 at 13:34
  • 1
    My best hunch is something like that using the "wrong" triangularization for Cholesky leads to an unintended pattern for the covariance that looks like leading to unbiasedness. E.g., your covariance matrix amounted to chol(covMat)%*%t(chol(covMat)), which has a very small variance of X[,3] of just point 0.07 relative to around three for the other variables - that maybe is too little signal to generate bias. – Christoph Hanck Oct 12 '23 at 13:36
  • 1
    It should be possible to use your numbers to work out the plim of your regression along the same lines of the derivations in my answer and see where that goes. – Christoph Hanck Oct 12 '23 at 13:38
  • I did that and indeed, with the "wrong" covariance matrix, the plim of the estimator is the true value. As to exactly why that works I still struggle with some intermediate steps and am hence unsure as to why that is and how general that finding is (like, would it work for any $K$ etc.?). – Christoph Hanck Oct 12 '23 at 17:59

1 Answers1

4

My answer would be that you cannot prove it because the claim is, as I understand it, not correct in general. At the bottom, the edit provides an analysis of why we nevertheless see unbiasedness in your code in your special case and derives more general lessons from this.

Consider a "true" DGP (with a "structural" error, i.e. one that is at least uncorrelated with the $x_j$) $$ y=x_1\beta_1+x_2\beta_2+x_3\beta_3+u $$ As we regress on $x_1$ and $x_2$ only, the plim of our regression coefficients will be, letting $x=(x_1\quad x_2\quad x_3)$ and (for "short" regression) $x_s=(x_1\quad x_2)$, the linear projection coefficients $$ \gamma=(E(x_sx_s'))^{-1}E(x_sy) $$ Take, e.g., a covariance matrix $$ \Sigma=E(xx')=\begin{pmatrix}1& 0.5& 0.25\\ 0.5& 1& 0\\ 0.25& 0& 1\end{pmatrix} $$ This should be consistent with your setting in which $x_1$ and $x_3$ are correlated, but $x_2$ are not. They are of course, and that is the point of this answer, "indirectly" correlated via $x_1$.

Now, given our numbers, $$ (E(x_sx_s'))^{-1}=\begin{pmatrix}4/3& -2/3\\-2/3& 4/3\end{pmatrix} $$ and $$ E(x_sy)=E\begin{pmatrix}x_1(x_1\beta_1+x_2\beta_2+x_3\beta_3+u)\\x_2(x_1\beta_1+x_2\beta_2+x_3\beta_3+u) \end{pmatrix}=\begin{pmatrix}\beta_1+0.5\beta_2+0.25\beta_3\\0.5\beta_1+\beta_2\end{pmatrix}, $$ where the second equality uses uncorrelatedness with the "structural error" $u$. Hence, all in all, $$ \gamma=\frac{1}{3}\begin{pmatrix}4(\beta_1+0.5\beta_2+0.25\beta_3)-2(0.5\beta_1+\beta_2)\\-2(\beta_1+0.5\beta_2+0.25\beta_3)+4(0.5\beta_1+\beta_2)\end{pmatrix} $$

Hence, there is in general no reason to suppose that $\gamma_2=\beta_2$.

Illustration:

library(mvtnorm)
n <- 50000
CovMat <- matrix(c(1, 0.5, 0.25, 0.5, 1, 0, 0.25, 0, 1), ncol=3)
X <- rmvnorm(n, sigma = CovMat)
u <- rnorm(n)
beta <- 1:2
e <- X[,3] + u              # i.e. \beta_3=1
y <- X[,1]*beta[1] + X[,2]*beta[2] + e
summary(lm(y~X[,1]+X[,2]-1))  # i.e. the coefficients are not close to the true values 1 and 2, but close to gamma

(gamma <- c(solve(CovMat[1:2,1:2])%%c(CovMat[1,1]beta[1]+CovMat[1,2]beta[2]+CovMat[1,3]1, CovMat[1,2]beta[1]+CovMat[2,2]beta[2]))) solve(crossprod(X[,1:2])/n) # to illustrate the inverse

crossprod(X[,1:2], y)/n # to illustrate E(x_sy)


EDIT: An analysis of what went "wrong" for the OP. A somewhat subtle story with, at least to me, a surprising result. I cannot quite make sense of whether there is something "deeper" to it that "had to" happen or if it was just a coincidence.

The OP generated $K=2$ (unlike $K=3$ in my above example, with one regressor going into the error term) correlated regressors and an error term via chol(covMat). Now, chol returns an upper diagonal matrix $L'$. Hence, when writing $L'm$ for $m\sim N(0,I)$, the covariance of $L'm$ will be $L'L$, and not $LL'$, the Cholesky decomposition of covMat.

Write the elements of covMat as, using the zeros employed by the OP which I think matter for the result to go through, $$ \begin{pmatrix} \sigma_{UU} & \sigma_{UV} & \sigma_{UZ}\\ \sigma_{UV} & \sigma_{VV} & \sigma_{VZ}\\ \sigma_{UZ} & \sigma_{VZ} & \sigma_{ZZ} \end{pmatrix}=\begin{pmatrix} \sigma_{UU} & 0 & \sigma_{UZ}\\ 0 & \sigma_{VV} & \sigma_{VZ}\\ \sigma_{UZ} & \sigma_{VZ} & \sigma_{ZZ} \end{pmatrix} $$ with "southeastern" (the sorting of the variables is of course arbitrary) block $\Sigma$ for the two regressors.

Using the formula for a $3\times 3$ Cholesky decomposition, when exploiting the zeros, $$ L=\begin{pmatrix} \sqrt{\sigma_{UU}} & 0 & 0\\ 0 & \sqrt{\sigma_{VV}} & 0\\ \sigma_{UZ}/\sqrt{\sigma_{UU}} & \sigma_{VZ}/\sqrt{\sigma_{VV}} & \sqrt{\sigma_{ZZ}-\sigma_{UZ}^2/\sigma_{UU}-\sigma_{VZ}^2/\sigma_{VV}}=:L_{33} \end{pmatrix} $$ so that the "wrong" Cholesky decomposition yields a new (symmetric with upper diagonal elements omitted) covariance matrix of the variables as in $$ L'L=\begin{pmatrix} \sigma_{UU}+\frac{\sigma_{UZ}^2}{\sigma_{UU}} & & \\ \frac{\sigma_{VZ}\sigma_{UZ}}{\sqrt{\sigma_{VV}\sigma_{UU}}} & \sigma_{VV}+\frac{\sigma_{VZ}^2}{\sigma_{VV}} & \\ L_{33}\frac{\sigma_{UZ}}{\sqrt{\sigma_{UU}}} & L_{33}\frac{\sigma_{VZ}}{\sqrt{\sigma_{VV}}} & L_{33}^2 \end{pmatrix}\equiv \begin{pmatrix} \theta_{U}&&\\ \theta_{UV}&\theta_{V}&\\ \theta_{UZ}&\theta_{VZ}&\theta_{Z} \end{pmatrix} $$ Now, by an analogous argument as above, for $x_s = (V\; Z)$, we have (in general for some covariance matrix $\Theta$, so not only this covriance matrix) that we may express the plim of the OLS estimator with two regressors in terms of the following (co-)variances: $$ (E(x_sx_s'))^{-1}=\frac{1}{\theta_Z\theta_V-\theta_{VZ}^2}\begin{pmatrix} \theta_{Z}&-\theta_{VZ}\\ -\theta_{VZ}&\theta_{V} \end{pmatrix} $$ and $$ E(x_sy)=\begin{pmatrix} \beta_V\theta_V+\beta_Z\theta_{VZ}+\theta_{UV}\\ \beta_V\theta_{VZ}+\beta_Z\theta_{Z}+\theta_{UZ} \end{pmatrix} $$ That is, we now have, for the first coefficient estimator, that $$ \begin{eqnarray*} \text{plim}\,\hat\beta_V&=&\frac{1}{\theta_Z\theta_V-\theta_{VZ}^2}[\theta_Z(\beta_V\theta_V+\beta_Z\theta_{VZ}+\theta_{UV})-\theta_{VZ}(\beta_V\theta_{VZ}+\beta_Z\theta_{Z}+\theta_{UZ})]\\ &=&\frac{1}{\theta_Z\theta_V-\theta_{VZ}^2}[\beta_V(\theta_Z\theta_V-\theta_{VZ}^2)+\theta_{Z}\theta_{UV}-\theta_{VZ}\theta_{UZ})]\\ &=&\beta_V+\frac{\theta_{Z}\theta_{UV}-\theta_{VZ}\theta_{UZ}}{\theta_Z\theta_V-\theta_{VZ}^2} \end{eqnarray*} $$ That is, the estimator is in general consistent if $$ \theta_{Z}\theta_{UV}=\theta_{VZ}\theta_{UZ} $$ That will be the case in the "canonical" case when the error $U$ is uncorrelated with both regressors ($\theta_{UV}=\theta_{UZ}=0$), but can also occur when these error correlations scale suitably with the variance of the endogenous regressor ($\theta_{Z}$) and the covariance of the two regressors ($\theta_{VZ}$).

(The result suggests that the equality cannot be satisfied when only one error correlation is zero but the regressors are correlated, i.e., when $\theta_{VZ}\neq0$.)

For example, we would get consistency on the first regressor if we modify my initial example to

library(mvtnorm)
n <- 50000
CovMat <- matrix(c(1, 0.5, 0.25, 0.5, 1, 0.5, 0.25, 0.5, 1), ncol=3)
X <- rmvnorm(n, sigma = CovMat)
u <- rnorm(n)
beta <- 1:2
e <- X[,3] + u              # i.e. \beta_3=1
y <- X[,1]*beta[1] + X[,2]*beta[2] + e
CovMat[1,1]*CovMat[1,3]==CovMat[1,2]*CovMat[2,3] # TRUE, i.e. validity condition satisfied
summary(lm(y~X[,1]+X[,2]-1))  # i.e. the coefficient on the first regressor is now correct

Now, this precisely turns out to be the case here with the "wrong" Cholesky decomposition: $$ \theta_{Z}\theta_{UV}=L_{33}^2\frac{\sigma_{VZ}\sigma_{UZ}}{\sqrt{\sigma_{VV}\sigma_{UU}}}=L_{33}\frac{\sigma_{VZ}}{\sqrt{\sigma_{VV}}}L_{33}\frac{\sigma_{UZ}}{\sqrt{\sigma_{UU}}}=\theta_{VZ}\theta_{UZ} $$ Hence, $$ \begin{eqnarray*} \text{plim}\,\hat\beta_V&=&\beta_V \end{eqnarray*} $$


Let us generalize the insights from this analysis a little. Consider a model $$ y=X\beta+e $$ in which $X$ and $e$ are potentially correlated. We may then write the OLS estimator, upon plugging in the model and simplifying, as $$ \hat\beta=\beta+(X'X/n)^{-1}\frac{X'e}{n}, $$ which will tend to, by a LLN, an expression of type $$ \text{plim}\,\hat\beta=\beta+\Sigma^{-1}\sigma_{Xe}, $$ The plims for coefficient $j$ therefore is $\beta_j$ plus the $j$th element of $\Sigma^{-1}\sigma_{Xe}$. Under exogeneity ($\sigma_{Xe}=0$), the estimator will be consistent.

However, it is, as we have seen, possible that the $j$th row of $\Sigma^{-1}\sigma_{Xe}$, which is a linear combination of the inverse of the regressor (co-)variances ($\Sigma$) and error correlations ($\sigma_{Xe}$), equals zero for suitable (if, I suppose, rare) patterns of $\Sigma$ and $\sigma_{Xe}$.

What this analysis also seems to reveal is that it is, for full rank $X$, not possible for all coefficients to be estimated consistently if $\sigma_{Xe}$ is not a zero vector: if that was the case, then $\Sigma^{-1}\sigma_{Xe}=0$, which would imply that the columns $\Sigma^{-1}$ are not linearly independent, so that it has reduced rank, and so would have $\Sigma$.

For example, in our above bivariate example, the plim for $\hat\beta_Z$ is, along the same lines, $$ \begin{eqnarray*} \text{plim}\,\hat\beta_Z&=&\frac{1}{\theta_Z\theta_V-\theta_{VZ}^2}[\theta_{V}(\beta_V\theta_{VZ}+\beta_Z\theta_{Z}+\theta_{UZ})-\theta_{VZ}(\beta_V\theta_V+\beta_Z\theta_{VZ}+\theta_{UV})]\\ &=&\frac{1}{\theta_Z\theta_V-\theta_{VZ}^2}[\beta_Z(\theta_{V}\theta_{Z}-\theta_{VZ}\theta_{VZ})+\theta_{V}\theta_{UZ}-\theta_{VZ}\theta_{UV}]\\ &=&\beta_Z+\frac{\theta_{V}\theta_{UZ}-\theta_{VZ}\theta_{UV}}{\theta_Z\theta_V-\theta_{VZ}^2}, \end{eqnarray*} $$ so that the condition for consistency of $\hat\beta_Z$ despite error correlation becomes $$ \theta_{V}\theta_{UZ}=\theta_{VZ}\theta_{UV}. $$ If we normalize variances to one for simplicity, this would entail that $$ \theta_{VZ}=\frac{\theta_{UV}}{\theta_{UZ}}, $$ where the condition for $\hat\beta_V$ would entail $$ \theta_{VZ}=\frac{\theta_{UZ}}{\theta_{UV}}, $$ which can only simultaneously be true if the error correlations are the same ($\theta_{UZ}=\theta_{UV}$), call this value $\theta_{e}$, or have opposite signs ($\theta_{UZ}=-\theta_{UV}$). In the former case (the latter is analogous), $\theta_{VZ}=1$, so that $$ \Theta=\begin{pmatrix} 1&\theta_{e}&\theta_{e}\\ \theta_{e}&1&1\\ \theta_{e}&1&1 \end{pmatrix}, $$ so that the southeastern matrix corresponding to the regressors clearly has reduced rank. More generally, if we substitute $$ \theta_{VZ}=\frac{\theta_{Z}\theta_{UV}}{\theta_{UZ}}, $$ into $$ \theta_{V}=\frac{\theta_{VZ}\theta_{UV}}{\theta_{UZ}}, $$ we obtain $$ \theta_{V}=\frac{\theta_{Z}\theta_{UV}^2}{\theta_{UZ}^2}, $$ and hence a southeastern block of $\Theta$ of $$ \theta_{Z}\begin{pmatrix} \frac{\theta_{UV}^2}{\theta_{UZ}^2}&\frac{\theta_{UV}}{\theta_{UZ}}\\ \frac{\theta_{UV}}{\theta_{UZ}}&1 \end{pmatrix}, $$ which again has rank 1 only.