0

I am having a bit of trouble understanding endogeneity.

If I have a regression specification where the "true" model looks like this:

$y = \beta_1 x + \beta_2 z + \epsilon $

And we lack data on $z$, so we run:

$y = \beta_1 x + \epsilon $

What confuses me is that this is only problematic when $ x $ and $ z $ are correlated. Well, let's say $ x $ is a function of $ z $, $ x = f(z) + \delta $ where the error term has mean zero, no correlation with anything else, and assume that we do have data on $ z $. Then, would running the regression:

$y = \beta_1 x + \beta_2 z + \epsilon $

Mean that I no longer have an endogeneity problem, and now the only problem in my model is multicollinearity if $ x $ and $ z $ are highly correlated? Further, would I just be able to run the two-stage regression:

$\hat x = \gamma_1 z + \gamma $

$y = \beta_1 \hat x + \beta_2 z + \epsilon $

Without worrying about it? $\hat x $ and $z$ would be correlated with each other but that should not be an issue. The idea I want to explore is that a variable impacts both another dependent variable and the independent variable, and I want to try to capture both effects. My thinking is that the two-stage approach I outlined above is incorrect, and the IV approach is appropriate here and so I should look for an instrument that is uncorrelated with $ x $ but is correlated with $ z $ in the first stage. Then, if I know the impact $ z $ has on $ x $, and the impact $ x $ has on $ y $, that is one pathway. Further, by instrumenting $ x $, I can find the impact $ x $ has on $ y $ outside of $ z $. Is this thinking correct?

mirrror
  • 43
  • 1
    It appears to me that you're confusing endogeneity and omitted-variable bias. – Durden Jun 20 '23 at 18:05
  • I suppose I'm having trouble understanding this because I was taught that omitted variable bias is one of the common sources of endogeneity (error term correlated with independent variables). But having posted this now, I think I conflated the two. – mirrror Jun 20 '23 at 18:23
  • 1
    You're right in that unobserved confounders can be a source of endogeneity. And you also intuited correctly that omission of $z$ would not bias $\beta_{1}$ if $corr(x,z)=0$. But what you confused was the solution to this problem: if you had $z$, as you posed, you would simply plug it into your original regression of $y$. The two-stage approach is for when you have another so-called instrumental variable $w$ which (under certain conditions) allows you to orthogonalize $x$ and $z$, such that the OVB disappears. – Durden Jun 20 '23 at 18:41
  • Thanks Durden, the second link is very helpful. I have some causal links flying around ($x$ is a function of $z$, $y$ is a function of $z$ through $x$, but also not through $x$) that made me think that it shouldn't be possible to just plug $z$ in with $x$. Just my muddled understanding. – mirrror Jun 20 '23 at 18:52

0 Answers0