2

I am currently reading the article by Elwert and Winship's Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable.

However, I am however quite perplexed by the definition of omitted variable bias given in the article (that is a variable "causing" both X and Y, which possibly motivates the relationship between the two).

When I studied violation of exogeneity assumption due to OVB in my econometric class, there has never been a strong focus on the directionality of the relationship between X and C (the omitted variable).

In the above-mentioned article, this distinction appears quite crucial. Confounders "cause" the relationship between X and Y, whereas mediators lie are on their causation path (thereby being a channel "caused" by x but not by Y).

However, it seems to me that in both cases the bias generated by the omission of the confounder and the mediator when estimating of the true coefficients linking X to Y would be equal.

So, I was wondering whether the concept of OVB in econometrics is slightly different and more general than the one presented in the article, and may includes these two cases.

Any clarification is more than appreciated (and maybe also a reference)!

User1865345
  • 8,202

1 Answers1

2

Omitted variable bias is very specific and refers to the bias in the parameter $\beta$ when omitting the variables $Z$ from a regression model of the following form:

$$Y = \alpha + X\beta + Z\gamma + \varepsilon $$ It is a purely statistical concept in that it has nothing to do with the causal relationships among the variables. As long as $X$ and $Z$ are correlated, there will be bias in estimating $\beta$ when $Z$ is omitted.

However, $\beta$ does not always represent a useful causal parameter. For example, if $Z$ is a mediator, then $\beta$ represents the direct effect of $X$ on $Y$. The direct effect is not always the quantity of interest. If the total effect is of interest, and $E[X\varepsilon]=0$, then omitting $Z$ from the model yields a biased estimate of $\beta$ but an unbiased estimate of a different parameter, the total effect. On the other hand, if $Z$ is a confounder, then $\beta$ represents the total effect of $X$ on $Y$, and omitting $Z$ would yield a biased estimate of the total effect. So, while omitted variable bias does not concern the causal status of the variables, the choice of whether to omit $Z$ (or the consequences of being forced to omit it) depends on the causal status of the variables and the quantity of interest.

More broadly, the model for $Y$ above doesn't even have to represent a valid causal model for omitted variable bias to occur. If it described the relationship between the variables in a general multivariate distribution (e.g., if $Y$ precedes $X$ and $Z$ but the relationship among them is accurately represented by this equation), then omitted variable bias can still occur if $X$ and $Z$ are correlated even if $\beta$ does not correspond to a meaningful causal parameter (which it wouldn't if $Y$ precedes $X$).

I think one thing missing in econometrics training is linking the statistical model to the causal parameter of interest; it is assumed a coefficient in the structural model for an outcome is a causal parameter of interest, when in reality the interpretation of that parameter depends on the causal relationships of the variables in the structural model, which are not captured by the structural model for the outcome. So, while omitted variable bias describes a real feature of regression, it doesn't directly map onto the more useful causal concepts of confounding and endogenous selection. In addition, these concepts are not attached to any specific model and don't require functional form assumptions to discuss, whereas omitted variable bias usually refers to the specific bias incurred when employing a specific statistical model (e.g., linear regression).

Noah
  • 33,180
  • 3
  • 47
  • 105
  • +1 My answer here visually illustrates omitted variables bias with a contribution in the spirit of Ascombe. – Alexis Dec 25 '22 at 23:44
  • Thank you for the answer.I understand the point you are making and it is interesting. However, the distinction between mediator and confounder seems a bit forced, in my opinion. Let us suppose that I am estimating returns from education without explicitly including labour market experience. Labour market experience (LME) is indeed related to wages and also to education. LME is also something "determined" by education, not the contrary. However,I wouldn't define LME as a mediator of education. Could it depend on whether I consider the direct effect of education the true effect I am looking for? – Lydia Palumbo Jan 15 '23 at 13:24
  • LME is a mediator between education and wages though. A mediator causes the outcome and is caused by the focal predictor, and that's exactly as you described LME. If you want the direct effect of education, you include LME in the model. If you want the total effect of education, you exclude LME from the model. (In both cases you would need to adjust for confounders of education and wages, of which LME is not one, keeping this simple.) – Noah Jan 15 '23 at 18:07
  • Thank you for the answer, Noah. Discussing this with you was really helpful. Now, I perfectly understood the differences between the two positions. – Lydia Palumbo Jan 19 '23 at 15:48