Intuition: Why do I have to worry about errors-in-variables?

Question

I've read that (ordinary) linear regression assumes that there are measurement errors in the dependent variable, but no measurement error in the independent variables -- and if I have measurement errors in the independent variables, then the assumptions of linear regression no longer apply and I have to do something more complicated.

My question: Why? Why do I have to worry about this? Why does the linear regression fail, when there are random errors in the independent variables?

In particular, what's wrong with the following reasoning? It seems to me like any linear model with random measurement errors in the independent variables, is equivalent to one without measurement errors in the independent variables. Let me spell out the reasoning. Suppose the true model involves random measurement errors in both variables, i.e.,

$$Y_i = \alpha + \beta X_i \qquad y_i = Y_i + e_{y,i} \qquad x_i = X_i + e_{x,i}$$

where $X_i,Y_i$ are the true underlying values, $x_i,y_i$ are the observed values, and $e_{x,i},e_{y,i}$ are measurement errors. (Here $x$ is the independent variable and $y$ is the dependent variable.) Suppose $e_{x,i} \sim \mathcal{N}(0,\sigma_x^2)$ and $e_{y,i} \sim \mathcal{N}(0,\sigma_y^2)$, i.e., the $e_{x,i}$'s are iid normal and so are the $e_{y,i}$'s. That would be an example of a simpler linear model, with measurement error in both the independent variable and dependent variable.

It seems to me this model is equivalent to the following simpler model with errors only in the independent variable:

$$y_i = \alpha + \beta x_i + e'_{y,i},$$

where $e'_{y,i} \sim \mathcal{N}(0, \beta^2 \sigma_x^2 + \sigma_y^2)$. Why? Because

$$y_i = \alpha + \beta (x_i-e_{x,i}) + e_{y,i} = \alpha + \beta x_i + (e_{y,i} - e_{x,i})$$

and $e_{y,i} - e_{x,i}$ has a normal distribution with mean $0$ and variance $\beta^2 \sigma_x^2 + \sigma_y^2$. Thus, it seems like even if the model is actually generated with measurement error in both variables, the data could be equally well-modelled by a linear model where we have errors only in the dependent variable, just with a somewhat larger variance in the errors. Thus, I don't see why applying ordinary linear regression will fail; it seems like the actual stochastic process (which doesn't satisfy the assumptions for OLS) is equivalent to one that does satisfy all the assumptions needed for ordinary linear regression.

I suspect I've gone wrong somewhere in my reasoning. Can anyone explain where I went wrong, and why we need to worry if there are measurement errors in the independent variables? I've tried reading a bunch of references on errors-in-variables, but either (a) they get very technical very quickly, and I'm lacking intuition (e.g., theoretical results about identifiability), or (b) they focus on telling me how to solve the problem without first establishing that there is a problem that needs to be solved.

I think you need to be more careful in distinguishing between mismeasured and true x. I believe that is where you are going astray. — dimitriy, Jul 22 '16 at 01:46
@DimitriyV.Masterov, Thanks for your comment. Can you elaborate? I thought I made that distinction in my mathematical formalization: true x is represented by $X_i$ and mismeasured x is represented by $x_i$. I'm more than ready to believe I've gone astray somewhere, but can you clarify where? — D.W., Jul 22 '16 at 01:50
Your new composite error term is no longer uncorrelated with the mismeasured regressor since $x_i=X_i + e_{x,i}$ and $\beta \cdot e_{x,i}$ is the second part of the error. OLS assumption has left the building! — dimitriy, Jul 22 '16 at 02:13
@DimitriyV.Masterov, ahh, perfect! That makes sense. Thank you for the explanation. Want to write this as an answer, so I can accept it? — D.W., Jul 22 '16 at 02:15
I don't think that's quite right. Heteroskedasticity does not cause bias in parameter estimates. — dimitriy, Jul 22 '16 at 02:17
@DimitriyV.Masterov, OK, I accept your correction, and I'll delete that part of my comment. Thanks again for the explanation! — D.W., Jul 22 '16 at 02:19

Intuition: Why do I have to worry about errors-in-variables?

0 Answers0

Linked