Suppose that my instrument z is sufficiently correlated with the endogenous independent variable x in consideration (z->x). Now, I know that my dependent variable y is also a predictor of z (y->z) but reverse causality does not hold. Would this make my instrument z to be considered invalid?
1 Answers
Yes, the instrument would be invalid. This is because for most cases, the unaccounted variation in y, becomes also part of your instrument z. In other words, z will be correlated with the error of y. Even if z is 'conceptually' correlated with your endogenous regressor x, it will be latently also correlated with the error term--which is the definition of an instrument being invalid.
To make things clear, a little Monte Carlo simulation:
I set up an equation: $y = b_1x_1 + b_2x_2 + e$ where $b_1=1$ and $b_2=1$
I create two instruments for $x_1$ now, $z_1$ and $z_{1b}$; however, $z_{1}$ is a proper instrument that is only correlated with $b_1$, but $z_{1b}$ is the bad one that is 'caused' by y.
The observation size is 10000 and result of the IV estimation with the second (bad) instrument is:
Formula: y ~ x1 + x2
Instruments: ~z1b + x2
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.01939093 0.22379803 -0.08664 0.93096
x1 1.24151903 0.02195892 56.53826 < 2e-16 ***
x2 1.00262791 0.00368895 271.79220 < 2e-16 ***
Note how the estimated coefficient for x1 is 1.24 and far from the population's model value of 1.0.
Whereas using the proper instrument $z_1$ works:
Model Formula: y ~ x1 + x2
Instruments: ~z1 + x2
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.183036941 0.222873494 0.82126 0.41152
x1 1.002586104 0.008785850 114.11373 < 2e-16 ***
x2 1.035735289 0.002412813 429.26455 < 2e-16 ***
Hope that helps! Below is the R code for home... (Note results may differ due to the randomness of data)
library(sem)
set.seed(12344321)
# A large sample of normal errors to be used for creating variables
e1 <- rnorm(10000)
e2 <- rnorm(10000)
e3 <- rnorm(10000)
e4 <- rnorm(10000)
e5 <- rnorm(10000)
e6 <- rnorm(10000)
e7 <- rnorm(10000)
e8 <- rnorm(10000)
e9 <- rnorm(10000)
e10 <- rnorm(10000)
e11 <- rnorm(10000)
e12 <- rnorm(10000)
e13 <- rnorm(10000)
e14 <- rnorm(10000)
e15 <- rnorm(10000)
x1<- e7*30 +e8*20 +e1*40
x2<- e1*40 + e2*100 + e10
x3<- e1*10 + 5*e9
#the regression equation
y <- 1.0*x1 + 1.0*x2 + 1.0*x3 + 20*e11
#a proper instrument
z1<- e7*15+ 10*e15
#the dubious instrument in consideration, it is 'caused' by y.
z1b <- y + 300*e14
#What are the correlations?
cor(x1, x2)
cor(y, x1)
cor(y, x2)
cor(x1, z1)
cor(x2, z1)
cor(y, z1)
cor(y, z1b)
cor(x1, z1b)
cor(x2, z1b)
summary(reg_1<-lm(y~x1+x2))
summary(tsls(y~x1+x2,~z1+x2))
summary(tsls(y~x1+x2,~z1b+x2))
- 2,204