2

I have been wondering about this: Do control variables need to have a correlation to both the independent variable and the dependent variable?

E.g. I want to check the effect of Education (independent variable) on Income (dependent variable) using a regression. Does it make sense to include control variables which obviously have a relationship to the dependent variable but not necessarily to the independent variable?

If not, why not? My friend argues that yes, they do need to have a correlation to both independent and dependent variable, otherwise they just go in the error term of the regression.

  • 2
    If you are interested in determining a causal relationship then your friend is correct. You include control to rid yourself of omitted variable bias. If a control is only correlated with one of the two variables (so...not both) it will just pile into the error term without muddling the causal estimate of your coefficient of interest. – 123 Apr 22 '17 at 22:09

1 Answers1

2

Your friend says:

they do need to have a correlation to both independent and dependent variable, otherwise they just go in the error term of the regression.

but one reason (maybe the main one) for including control variables is to reduce the error term (or noise) in the model. Another reason for including them is that they are part of an interaction.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • 1
    (+1) It's even more important once you go beyond standard linear regression. In binary regression or survival analysis, omitting any outcome-associated predictor from the model leads to bias in coefficient estimates of included predictors. See this page. – EdM Dec 29 '23 at 16:23