5

Why do we need to include precision variables in a regression model (i.e., a variable that is associated with the outcome but not the predictor of interest)?

guestom
  • 51

3 Answers3

5

Here is my contribution to Anscombe's Quartet.

The below image shows a regression line on some imaginary data. The estimated regession line is $\hat{y}_{i} = 3 + .5x$, the p-value for the t-test of $H_{0}: \beta=0$ equals 0.002, and the $R^{2}$ for my regression model is 0.67 (just as for the four graphs in Anscombe, 1973).

Regression line of y on x

This regression/regression line says that as $x$ increases by two units, $y$ increases by one unit. But there's something wrong: I have omitted another variable that also explains $y$. This second variable is dichotomous, and indicates which group one is in (the above analysis combined data from both groups). We can model it as $y_{i} = \beta_{0} + \beta_{x}x_{i} + \beta_{z}z_{i} + \varepsilon_{i}$, where $\varepsilon\sim \mathcal{N}\left(0,\sigma_{\varepsilon}\right)$.

Regression of y on x and z

Now we see that actually, as $x$ increases, $y$ actually decreases! This means our first model was not only incorrect, but badly so, because no matter whether one is purple or orange, $y$ has an inverse relation to $x$. In the first model we violated the assumption of no missing variables, and this is why it is important to include covariates (what you label "precision variables") in our model.

Alexis
  • 29,850
5

Regression coefficient is often characterisized as a partial correlation coefficient which means it will show effect of particular variable X to the outcome variable Y after effects of other variables, Z, are controlled.

What happens when you omit Z and leave only X? Do coefficient for X change?

If variables Z and X are orthogonal, which rarely happens outside experimental data, these coefficients will not change but in other situations you cannot say that coefficient for X variable measures just effect of variation in X for Y.

In econometrics omission of important variables is called omitted variable bias and it states that marginal effect from the X variable to the Y will no longer be estimated without bias.

Goal of modeling is to find out all relevant variables and to check that residual variation behaves well.

Analyst
  • 2,655
2

Precision variables would help to reduce the standard errors and hence shrink confidence intervals on the coefficients that you are interested in, and hence make it easier to find significant effects of the variables you are interested in.