3

Consider the linear regression model $$ Y_i=X_i^\top \beta+U_i. $$

Suppose some regressors are not orthogonal to $U_i$, i.e., $E(X_i U_i)\neq 0$. Then, the OLS estimator is not consistent (Hayashi, chapter 2). The usual way to proceed consists of finding instruments $Z_i$ such that $E(Z_i U_i)=0$ and such instruments should explain some variations in $X_i$.

The answer to the question here suggests that, although rare and perhaps tricky, we might be able to construct instruments just by taking appropriate transformations of $X_i$. For instance, take $X_i$ scalar with $E(X_i U_i)\neq 0$. Consider the function $f(X_i)=X_i^k$ with $k$ even. Suppose $(X_i,U_i)$ are symmetric around zero. Then, even if $E(X_iU_i)\neq 0$, it holds that $$ E(X^k_i U_i)=0. $$ Hence, we could set $Z_i\equiv X^k_i$. However, I've never found such a discussion in any texbook. Hence, I wonder whether there is something fundamental I'm missing here. Perhaps, $Z_i$ so defined would be a very weak instrument? Could you help me understand?

Star
  • 826

1 Answers1

0

The method of instrumental variables chooses a variable $Z$ (that satisfies certain conditions), uses it to fit a regression of $X$ w.r.t. $Z$, let's call it $\hat X(Z)$, and then uses $\hat X(Z)$ to get an estimate of the effect of $X$ on $Y$.

You suggest $Z=X^k$ with $k$ being even, e.g. $k=2$. That means we have to regress $X$ on $X^2$. This would be a very bad regression because for each $X^2$ there are two possible values $X$. This in turn would result in the method of instrumental variables not working well.

Also, for $Z$ to be usable as instrumental variable, certain conditions have to be satisfied. In particular: $$ (Z \perp\kern-5pt\perp Y)_{G_{\bar X}}, $$ meaning that in the graph $G_{\bar X}$, which is the original graph $G$ with all the arrows going into $X$ being cut off, there must not be any d-separation open path between $Z$ and $Y$. But the original graph $G$ looks like this:

enter image description here

and $G_{\bar X}$ then is:

enter image description here

and the path $X^2 \leftarrow X \to Y$ is clearly an open path. Thus, this condition for $X^2$ being an instrumental variable is violated.


The regression of $X$ by $X^2$:

> x <- -10:10
> df <- data.frame(x = x, z = x^2)
> m <- lm(x ~ z, data = df)
> summary(m)

Call: lm(formula = x ~ z, data = df)

Residuals: Min 1Q Median 3Q Max -10 -5 0 5 10

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.697e-16 2.088e+00 0 1 z -2.372e-17 4.250e-02 0 1

Residual standard error: 6.366 on 19 degrees of freedom Multiple R-squared: 2.005e-32, Adjusted R-squared: -0.05263 F-statistic: 3.81e-31 on 1 and 19 DF, p-value: 1

> plot(z, x, panel.first = grid()) > lines(z, m$fitted.values, col = 'red') >

enter image description here

I think it is evident, without any additional metrics, that this regression is bad.

frank
  • 10,797
  • About regressing $X$ on $X^2$, when you write "This in turn would result in the method of instrumental variables not working well.": formally, which property would it be violated? – Star Jul 31 '22 at 10:07
  • Even if all the requirements for the method of instrumental variables are satisfied, of course, the quality of the result still depends on the quality of the regression $Z\to X$. – frank Jul 31 '22 at 10:28
  • How do you enstablish the quality of the regression Z on X? – Star Jul 31 '22 at 10:31
  • You have to find a regression of $X$ on $Z$ (i.e. $Z\to X$), not $Z$ on $X$. – frank Jul 31 '22 at 10:34
  • Yes sorry. How do you enstablish the quality of that? – Star Jul 31 '22 at 10:45
  • There are many metrics, e.g. the coefficient of determination. But the main point here is that you later want to regress $Y$ on $\hat X(Z)$, and if $\hat X(Z)$ is a really bad regression, i.e. your input is already wrong, you cannot expect the result to be useful. Now, imagine your values for $X$ are in the interval [-10, 10]. If you decide to regress $X$ on $X^2$ ($X^2\to X$), you have to make decisions about the sign, e.g. you have to decide whether you assign to $X^2=100$ the value $10$ or $-10$. – frank Jul 31 '22 at 11:02
  • And whichever you choose, then, either for $X=-10$ or for $X=10$, you will be off by $20$. – frank Jul 31 '22 at 11:02
  • I'm sorry, but you keep using words like "bad" regression, "wrong" input, which I do not understand in practice. If I regress $X$ on $X^2$ what goes wrong in Stata (or R)? – Star Jul 31 '22 at 11:08
  • Also, in your path representation, why does the arrow goes from $X$ to $X^2$? Couldn't it go from $X^2$ to $X$ just by thinking about the inverse function? Is this just a feature of the quadratic function? What if $f(X)$ is a one-to-one mapping? – Star Jul 31 '22 at 11:23