3

First of all, many thanks to those who have kindly answered my previous questions, I am learning a lot!

Suppose I have a model

y = x1 + x2 + x3

x1 is endogenous and has instrument Z1 that is relevant and exogenous conditional upon X2 and X3.

So

First Stage:

X1 = Z1 + X2 + X3

Reduced Form:

Y = Z1 + X2 + X3

Structural Equation

Y = (X1 = Z1) X1 X2

My questions are:

Is it convention to plot the coefficient on Z1 against X1 to illustrate relevance?

Also to plot the coefficient on Z1 in the reduced form on y.

Additionally, I am confused about the relationship in the structural equation. I understand the relationship of the coefficient on Z1 in the first stage on X1, but in the second stage we substitute x1 not only with the coefficient on Z1 in the first stage but the fitted value of X1, which entails the coefficients on X2 and X3 in addition to Z1 - I don't see why and this clouds my understanding of who the instrument impacts and to what magnitude.

user52072
  • 105

1 Answers1

1

Commonly people do not plot the coefficient of $Z_1$ against $X$ because $Z_1$ is just going to be a constant so this plot will not tell you much. If you want to show an association between your instrument and the endogenous variable you should consider the first stage F-statistic (which in a 1-endogenous variable, 1-instrument case is just the t-statistic on the instrument in the first stage) or the pairwise correlation between the instrument and the endogenous variable. For visual inspection you could do a scatter plot of the instrument against the endogenous variable.

For the second part of your question I have written a similar answer here so with your permission I will copy this.

If you have an endogenous variable $D$ (which in your case would be $X_1$, I just call it differently), an outcome $Y$, an instrument $Z$ and other exogenous variables $X$, the problem of the regression $$Y_i = \alpha + \delta D_i + X'_i \beta + \epsilon_i$$ is that $\text{Cov}(D_i,\epsilon_i) \neq 0$ which may be due to omitted variable bias, measurment error, simultaneity, or a mix of those. For example, let's say that $\epsilon_i = e_i + W_i$, where $e_i$ is a stochastic error and $W_i$ is the omitted variable which is such that $\text{Cov}(D_i,W_i)\neq 0$ which is the source of the endogeneity problem. Say you then estimate your first stage with the instrument, $$D_i = a + \pi Z_i + X'_i \gamma + \eta_i$$ where $\eta_i = u_i + W_i$ and $u_i$ again is a stochastic error term. Remember that the regression splits the variation in the dependent into an explained and an unexplained part, i.e. $$\text{total sum of squares} = \text{explained sum of squares} + \text{residual sum of squares}$$ When you predict $\widehat{D}_i = a + \pi Z_i + X'_i \gamma$, this separates the "good" variation of $D_i$ (that's the variation explained by variables that you know are exogenous which are $Z$ and the $X$) from the "bad" variation in $\eta_i$. In the second stage, $$Y_i = \alpha + \delta \widehat{D}_i + X'_i \beta + \epsilon_i$$ you still have $W_i$ in the error term but this time it's not a problem because $\text{Cov}(\widehat{D}_i,W_i) = 0$ and this solves your endogeneity problem, allowing for consistent estimation of $\delta$.

Andy
  • 19,098
  • Thanks. My instrument Z1 is continuous but only exogenous conditional upon X2 and X3. So could I scatter plot the coefficient on Z1 in the first stage times Z1 and plot it against X1 to show the visual relationship between the exogenous part of the instrument and the endogenous variable? Also, given a continuous instrument how would I interpret the coefficient on Z1 in the first stage as the population upon whom the instrument impacts, but my confusion is also that X2 and X3 impact treatment X1 so how is this to be interpreted? – user52072 Jul 15 '14 at 15:13
  • 1
    You could run a regression of Z1 on X2 and X3 and then plot the residuals of this regression against X1. The residuals from this regression are the variation in Z1 after partialing out the effect of X2 and X3. The interpretation of the first stage coefficients depends on the endogenous variable - is it continuous or binary, for instance? – Andy Jul 15 '14 at 17:15
  • If Z1 is valid only if X2 and X3 are conditioned for, this doesn't really change anything. This is the usual conditional independence assumption. For a continuous endogenous variable you interpret the first stage coefficients like in any other OLS regression (because that's what it is at the end of the day). – Andy Jul 15 '14 at 23:11