8

I often run regressions from a low-n dataset (~100 observations). Often the results are only significant with the inclusion of control variables. However, I often see journal articles where people (always with a massive number of observations) claim to have run their regression "with and without control variables".

Why do people often run a regression with and without control variables?

4 Answers4

5

A little bit on terms first. By definition control variable is kept constant through the study, so you can't use it in regression. You probably mean variables that should be statistically controlled for. Such as covariates or blocking factors (as after randomized block experimental design)

People run regression or ANOVA with such variables not only to wash their effect off predictor variables but mainly to check whether their own effect is significant. If it is significant then their inclusion in the model is fully warranted. If not, they might better be excluded from the model.

This is mostly important for a blocking factor. If you leave it in the model despite that it is not significant you risk to miss the effect of predictor variables due to decreas in Error term df, - blocking factor decreases both Error and its df, and there appeares a competitive situation. Significance of predictors may go down or up depending on "what wins" - fall of Error sum-of-squares of fall of its df. This may be the reason why people prefer more concise models sometimes.

Another reason for this may be that for sample as moderate as 100 inclusuion a lot of IVs, even if they all seem important or significant, lead to overfitting.

ttnphns
  • 57,480
  • 49
  • 284
  • 501
3

One more reason to include covariates is that they are important in the literature. If you can demonstrate that some covariate that has been found to have large effects in the past (either on its own or by affecting other parameters) does NOT have large effects in your study, then you have discovered something interesting.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
2

Typically, this means that there is a regression with an outcome and a treatment variable. Then, there are other controls that could be added to the model---other covariates that may be important. The authors first run a simple model that only includes treatment. Then, they check the robustness of their findings to the inclusion of other variables. In particular, they ask whether the inclusion of other covariates reduces or eliminates the impact estimated in the simple model.

Additionally, the inclusion of other covariates typically reduces standard errors. In this case, authors may find that the estimated impact is relatively similar between the simple model and the one that includes controls, but only in the latter is the estimate significant (usually, different from 0). The authors would then use the latter model to perform inference (hypothesis tests, confidence intervals) because of its smaller standard errors.

Charlie
  • 14,062
  • 5
  • 44
  • 72
1

In addition to the answers above, there are some covariate selection techniques that involve comparing models with and without a variable in place. And if one wishes to illustrate the effect of adding a covariate, the crude (unadjusted) model is necessary as a reference in the first place.

Fomite
  • 23,134