The key assumption: the IV is independent of the response varible Y, cannot be tested empirically and can be argued only theoretically. Is this true? Why?
And why is this a problem when we use multiple instrumental variables (e.g. genetic variants)?
The key assumption: the IV is independent of the response varible Y, cannot be tested empirically and can be argued only theoretically. Is this true? Why?
And why is this a problem when we use multiple instrumental variables (e.g. genetic variants)?
In a regression like $$Y_i = \alpha + \beta X_i + \eta_i$$ where $X_i$ is the endogenous variable such that $Cov(X_i,\eta_i)\neq 0$, a "good" instrument must satisfy two conditions, which are
The stronger assumption of independence between the instrument and the structural error is invoked in a particular type of model, namely the linear constant effects model. Independence also implies that the instrument and the error are uncorrelated, the converse is not true though. You can visualize the idea of the exclusion restriction in the graph below: $$\begin{matrix} Z & \rightarrow & X & \rightarrow & Y \newline & & \uparrow & \nearrow & \newline & & \eta & \end{matrix}$$
The fundamental problem with testing the exclusion restriction is that it involves the structural error $\eta$ which is never observable. This is why you cannot formally test this restriction, neither with one nor with thousand instruments. To motivate the exclusion restriction we therefore often need to rely on good theoretical foundations of the relationship under investigation.
Having said that, you might not want to use 1000 instruments because what matters is the quality of the instruments and not the quantity. There are two distinct problems, one which relates to the inconsistency of instrumental variables methods under many instruments and, often related, the problem of having weak instruments. See for example this lecture on the topic.
First, as others have said, the assumption as you have stated is not correct.
The standard IV model is given by,
And the key assumptions here are that $Z$ has no effect on $Y$ except through $X$ (exclusion restriction) and that there are no common causes between $Z$ and $Y$ (independence restriction, or unconfoundedness of $Z$).
Since we are saying that $Z$ does not affect $Y$ other than through $X$, it would seem reasonable to think that we could check whether $Z$ is independent of $Y$ conditional on $X$ to test that assumption. However, conditioning on $X$ opens the colliding path $Z \rightarrow X \leftrightarrow Y$, which creates a spurious association between $Z$ and $Y$. That is, even though there is no direct effect of $Z$ on $Y$, we would still see that $Z$ is associated with $Y$ conditional on $X$.
That being said, it is not completely true that the exclusion restriction assumption is not testable. Although there are no conditional independences implied by the model, if the variables are discrete, the IV model does have testable implications, in the form of inequalities. These are usually called "instrumental inequalities". To learn more, I suggest the original paper by Pearl and a recent review by Swanson and colleagues.
The assumption is not correct as you stated it. The correct version is: the instrument I is independent of the outcome Y given the covariates X. This is called the exclusion restriction. If you ignore the covariates, then there should be a dependence of Y on I (otherwise either the link I -> X or the link X -> Y are missing).
[Removed the rest of this answer - jabberwocky is correct]
[I second Rob's clarification about revising the independence statement, but I disagree with his statements about testing the exclusion restriction.]
The exclusion restriction cannot be tested. Some tests are possible if the researcher imposes additional assumptions, but as a general rule the exclusion restriction cannot be tested. The statements below are intended to be general statements.
The first-stage of your IV regression is testable, sometimes called the inclusion restriction. Does your instrument (I) affect your treatment (T)? That's testable with an F-test. This is used to test whether you have a strong or a weak instrument.
But you cannot test the exclusion restriction, that is, you cannot test whether the only path from I to Y runs through T (I->T->Y and not I->Y and not I->e->Y, where e is your error term). You cannot test the exclusion restriction for the same reason you are looking for an instrument in the first place: the relationship between T and Y is confounded by some error or unobservable factors. Therefore, any test of conditional independence between I and Y controlling for T would be confounded by the same error or unobservable factors.
So how do you make the argument for an instrument? Arguing that there is plausible causal pathway from your instrument (I) to your outcome (Y) requires what David Freedman calls "Shoe leather": intimate knowledge of the subject matter to develop meticulous research designs and eliminate rival explanations. That is, by using IV regression you are proposing a natural experiment. The natural experiment doesn't rely on statistical tests but rather on the assertion that you've found some as-if random process that eliminates confounding.
Reference: “Statistical Models and Shoe Leather,” David Freedman, 1991.
Several other answers have already done a good job explaining the underlying causal assumptions of the method (I especially like Carlos's answer). As has been pointed out, there are some observable implications to the satisfaction or non-satisfaction of the underlying instrumental assumption, and thus, it is not entirely true that it cannot be tested. Moreover, by asserting the absence of other causal pathways, one can at least falsify the IV assumption empirically by conducting separate research that establishes evidence for a "back-door" causal pathway to be present.
In any case, one important point to make about causal analysis hinging on instrumental variables is that the analysis is extremely sensitive to the IV assumption. The assumption itself is often dubious, and loss of that assumption is usually catastrophic for the analysis. My experience of seeing researchers use this method makes me extremely sceptical of its utility. Perhaps others have had different experiences, but in many years of seeing this technique used by economic researchers:
Every application I have seen has involved possible (even plausible) "back-door" causal pathways that would falsify the IV assumption;
Arguments for the IV assumption have almost always been based on theoretical argument of why there should not be any causal pathway that would falsify it (usually dubious temporal arguments), and those arguments have often been flimsy; and
I have not seen any instance in which research has shown satisfying empirical evidence for the absence of "back-door" pathways that would falsify the assumption; and
Most research does not examine the sensitivity of the causal conclusion to loss of the IV assumption (and it is highly sensitive to loss of this assumption).
Causal analysis using instrumental variables is a clever idea in theory, but the method hinges heavily on strong causal assumptions that are usually false, and are hard or impossible to test in most applications. For all the above reasons, I tend to regard causal research hinging on the use of instrumental variables as being highly dubious. Controlled randomised experimentation remains by far the best method of causal research.
I don’t have the reputation to add a comment to Rob’s answer - the currently accepted answer.
However, describing the exclusion restriction as “Z is independent of Y given X” is not correct. Controlling for X induces spurious correlation between Z and Y as described in Carlos’s answer.
In the simple one variable toy example, the exclusion restriction says Z affects Y only through X, which is not analogous to saying Z is independent of Y when conditioning on X.
imo Rob’s answer should not be the accepted answer.
I’ll also add another explanation: Commonly, students want to run Y = bX + aZ to test exclusion, hoping a=0. However because X is correlated with the error term, the estimate for b will be biased. Further, since X and Z are correlated (because the instrument has relevance), the estimated coefficient for a will also be biased.