Dummy variable or sample split in OLS regression to examine two time subsamples?

Question

For a paper I'm investigating the effectiveness of monetary policy (Quantitative Easing) during the Covid-19 crisis. I want to compare the pre-pandemic period with the pandemic period. To do so, I simply split the sample so that I can clearly compare the full period, pre-pandemic and pandemic period. I use a simple OLS regression, and have based the sample split on a paper that does something similar.

However, my supervisor suggested to use an interaction term with dummy that indicates the Covid-19 period, because I have only 16 observations for the Covid-19 period. I used both methods, and the results are actually quite similar: the coefficients, sd's and significance are almost identical and there are only some minor differences.

The question: could anyone tell me which method is best, and why? And, what is a logical reason to split the sample instead of using an interaction term with a dummy.

This is really a FAQ, see for instance: https://stats.stackexchange.com/questions/373890/separate-models-vs-flags-in-the-same-model, https://stats.stackexchange.com/questions/17110/should-i-run-separate-regressions-for-every-community-or-can-community-simply-b, https://stats.stackexchange.com/questions/30035/is-it-acceptable-to-run-two-linear-models-on-the-same-data-set — kjetil b halvorsen, Jul 12 '22 at 14:22
@kjetilbhalvorsen I agree, but all of these questions are about groups, not about time periods. So I wondered whether it is a logical thing to run seperate models for periods, not of groups of categories. — Lennart de Jong, Jul 12 '22 at 16:39

score 1 · Answer 1 · answered Jul 12 '22 at 13:53

When you have a dummy for both the intercept and interact that dummy with the regressor, the coefficients will be exactly the same (where the interpretation of dummy coefficients as changes will have to proceed as usual, as, for example in the illustration below, the maleTRUE dummy is to be interpreted as a change relative to the baseline intercept, which represents the female category). This is because OLS will then be able to optimize freely for both subsamples.

Try this illustration:

n <- 20
male <- sample(c(T,F), n, T)
female <- !male
x <- rnorm(n)
y <- rnorm(n)
summary(lm(y~x*male))
summary(lm(y[male]~x[male]))
summary(lm(y[female]~x[female]))

Selected output:

> summary(lm(y~x*male))
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.22481    0.44860  -0.501    0.623
x           -0.99800    0.64818  -1.540    0.143
maleTRUE    -0.06542    0.59923  -0.109    0.914
x:maleTRUE   1.02459    0.84185   1.217    0.241
> summary(lm(y[male]~x[male]))
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.2902     0.4342  -0.668    0.518
x[male]       0.0266     0.5871   0.045    0.965
> summary(lm(y[female]~x[female]))
Coefficients:
            Estimate Std. Error t value Pr(>|t|)

(Intercept)  -0.2248     0.3394  -0.662   0.5371

x[female]    -0.9980     0.4904  -2.035   0.0975 .

Hence, the choice does not matter for the coefficients.

For the standard errors, you see that the choices are not equivalent, which, depending on the coefficient, may be driven by the differing interpretation of the coefficient for the intercepts and dummies, but also by the fact that the joint regression uses a pooled error variance estimator where the separate regressions of course also use separate error variance estimators.

thank you for your reply! I understand what you are saying, but I wonder then if it actually matters. The standard deviations do not vary a lot between the dummy- and separate models. Is there any logical reason for using a split rather than a dummy? — Lennart de Jong, Jul 12 '22 at 16:40
It typically would not matter, I would say. Having one big regression easily allows to infer if the differences are significant, though. Also, OTOH, if the error variances differ strongly, separate regressions may be estimated more precisely. — Christoph Hanck, Jul 13 '22 at 05:44

Dummy variable or sample split in OLS regression to examine two time subsamples?

1 Answers1