4

Suppose that my instrument z is sufficiently correlated with the endogenous independent variable x in consideration (z->x). Now, I know that my dependent variable y is also a predictor of z (y->z) but reverse causality does not hold. Would this make my instrument z to be considered invalid?

CodeTrek
  • 609

1 Answers1

3

Yes, the instrument would be invalid. This is because for most cases, the unaccounted variation in y, becomes also part of your instrument z. In other words, z will be correlated with the error of y. Even if z is 'conceptually' correlated with your endogenous regressor x, it will be latently also correlated with the error term--which is the definition of an instrument being invalid.

To make things clear, a little Monte Carlo simulation:

I set up an equation: $y = b_1x_1 + b_2x_2 + e$ where $b_1=1$ and $b_2=1$

I create two instruments for $x_1$ now, $z_1$ and $z_{1b}$; however, $z_{1}$ is a proper instrument that is only correlated with $b_1$, but $z_{1b}$ is the bad one that is 'caused' by y.

The observation size is 10000 and result of the IV estimation with the second (bad) instrument is:

Formula: y ~ x1 + x2
Instruments: ~z1b + x2

               Estimate  Std. Error   t value Pr(>|t|)    
(Intercept) -0.01939093  0.22379803  -0.08664  0.93096    
x1           1.24151903  0.02195892  56.53826  < 2e-16 ***
x2           1.00262791  0.00368895 271.79220  < 2e-16 ***

Note how the estimated coefficient for x1 is 1.24 and far from the population's model value of 1.0.

Whereas using the proper instrument $z_1$ works:

Model Formula: y ~ x1 + x2
Instruments: ~z1 + x2
               Estimate  Std. Error   t value Pr(>|t|)    
(Intercept) 0.183036941 0.222873494   0.82126  0.41152    
x1          1.002586104 0.008785850 114.11373  < 2e-16 ***
x2          1.035735289 0.002412813 429.26455  < 2e-16 ***

Hope that helps! Below is the R code for home... (Note results may differ due to the randomness of data)

library(sem)
set.seed(12344321)

# A large sample of normal errors to be used for creating variables
e1 <- rnorm(10000)
e2 <- rnorm(10000)
e3 <- rnorm(10000)
e4 <- rnorm(10000)
e5 <- rnorm(10000)
e6 <- rnorm(10000)
e7 <- rnorm(10000)
e8 <- rnorm(10000)
e9 <- rnorm(10000)
e10 <- rnorm(10000)
e11 <- rnorm(10000)
e12 <- rnorm(10000)
e13 <- rnorm(10000)
e14 <- rnorm(10000)
e15 <- rnorm(10000)


x1<- e7*30  +e8*20 +e1*40
x2<- e1*40 + e2*100 + e10
x3<- e1*10 + 5*e9


#the regression equation
y <- 1.0*x1 + 1.0*x2 + 1.0*x3 + 20*e11


#a proper instrument
z1<- e7*15+ 10*e15

#the dubious instrument in consideration, it is 'caused' by y.
z1b <- y + 300*e14


#What are the correlations?
cor(x1, x2)
cor(y, x1)
cor(y, x2)
cor(x1, z1)
cor(x2, z1)
cor(y, z1)
cor(y, z1b)
cor(x1, z1b)
cor(x2, z1b)

summary(reg_1<-lm(y~x1+x2))
summary(tsls(y~x1+x2,~z1+x2))
summary(tsls(y~x1+x2,~z1b+x2))
Majte
  • 2,204