3

Reading the literature on the subject, I haven't encountered clear reasoning why the parallel trends assumption must hold. In fact, there have been recent papers on ways to relax this assumption (see Rambachan and Roth (2019), Bilinski and Hatfield (2019), Freyaldenhoven, et al. (2019)).

To me, it seems like the parallel trends assumption is solving a problem that doesn't exist. The goal in DiD analysis is to estimate the average treatment effect on the treated, either as an absolute change or percentage change. Does it matter if the baseline means and trends for the outcomes differ for the treatment and control groups, if we're only interested in comparing the changes in those trends in the post-treatment time period?

For example, consider the following statistically significant linear trends for made-up monthly medical cost data (with unspecified but unequal intercepts):

$$ \begin{array}{c|lcr} \text{Period} & \text{Control group} & \text{Treatment group} \\ \hline \text{Pre} & y_{ctrl} = \beta_0 + 10*t & y_{trmt} = \beta_2 + 20*t \\ \text{Post} & y_{ctrl} = \beta_1 + 15*t & y_{trmt} = \beta_3 + 25*t \\ \end{array} $$

The pre/post trend in cost increases by 50% in the Control group and increases by 25% in the Treatment group.

The baseline trends are not the same but we're still seeing a significantly lower pre/post trend change for the Treatment group.

RobertF
  • 6,084
  • 2
    Imagine a scenario where the trends across groups were already moving apart before the treatment went into effect. Now say we continue to observe diverging group trends even as they moved into the post-period. Do you think we can make causal statements about the effect of the treatment in this setting? Remember, the control group is “approximating” what would have happened in the treatment group in the absence of any treatment/intervention. – Thomas Bilach Mar 02 '22 at 19:00
  • @ThomasBilach Ah good point - if we're focused on measuring the ATT then yes we want the control group trend to match the treatment group trend in the pre-treatment period. However if we're more interested in the ATE then the parallel trend assumption isn't necessary, correct? – RobertF Mar 02 '22 at 19:14
  • @ThomasBilach Laura Hatfield's page on DiD explains the parallel trend test for the pre-intervention period is (by itself) "neither necessary nor sufficient to establish validity of diff-in-diff (Kahn-Lang and Lang 2018)". In fact you can get by comparing aggregate measures (average or sum) of the pre and post-intervention outcomes in DiD analysis. https://diff.healthpolicydatascience.org/#parallel – RobertF Nov 09 '22 at 14:10

2 Answers2

9

The contrast (i.e., estimand) of interest in diff-in-diff is $\color{red}{E[Y^1_{post}|A=1]} - \color{blue}{E[Y^0_{post}|A=1]}$, which relies on the unobserved quantity $\color{blue}{E[Y^0_{post}|A=1]}$. How can we get this quantity if it is unobserved?

The parallel trends assumption is a counterfactual assumption about $\color{blue}{E[Y^0_{post}|A=1]}$, the mean potential outcome in the post-period for the treated units had they instead received control. The assumption can be stated as follows:

$$\color{blue}{E[Y^0_{post}|A=1]}-\color{green}{E[Y^0_{pre}|A=1]} = \color{darkorange}{E[Y^0_{post}|A=0]}-\color{brown}{E[Y^0_{pre}|A=0]}$$

The quantity on the left is the trend in the potential outcomes under control (i.e., difference between outcomes post and pre) for the treated units, and the right side is the trend in the potential outcomes under control for the control units. The parallel trend assumption states that these two trends are equal (i.e., parallel if plotted). See the graph below, which colors the dots corresponding to the quantities they represent:

enter image description here

The dotted line represents the counterfactual trend under control for the treated units. The solid lines represent the observed trends. The parallel trends assumption is that the dotted line is parallel with the bottom solid line.

The assumption is fundamentally untestable because there is no data for $\color{blue}{E[Y^0_{post}|A=1]}$; for the treated units in the post-period, we only observe their potential outcomes under treatment (i.e., $\color{red}{E[Y^1_{post}|A=1]} = \color{red}{E[Y_{post}|A=1]}$).

It is important to note that the terms on the right side are observed; they are simply the observed outcome means in the control group before and after treatment. We still don't have $\color{green}{E[Y^0_{pre}|A=1]}$; to get this, we need the assumption $$ \color{green}{E[Y^0_{pre}|A=1]} = \color{green}{E[Y^1_{pre}|A=1]} $$ That is, the pre-period outcomes don't depend on the treatment you end up receiving (i.e., because the future can't affect the past). This quantity is also observed; it's just the average outcome in the treated group in the pre-period.

So now, thanks to the parallel trends assumption, we can write \begin{align} \color{blue}{E[Y^0_{post}|A=1]} &= \color{green}{E[Y^0_{pre}|A=1]} + \color{darkorange}{E[Y^0_{post}|A=0]} - \color{brown}{E[Y^0_{pre}|A=0]} \\ &= \color{green}{E[Y^1_{pre}|A=1]} + \color{darkorange}{E[Y^0_{post}|A=0]} - \color{brown}{E[Y^0_{pre}|A=0]} \\ &= \color{green}{E[Y_{pre}|A=1]}+\color{darkorange}{E[Y_{post}|A=0]}-\color{brown}{E[Y_{pre}|A=0]} \end{align} where the last line is made up solely of observed quantities.

Finally, we can write the counterfactual estimand as \begin{align*} \color{red}{E[Y^1_{post}|A=1]} - \color{blue}{E[Y^0_{post}|A=1]} &= \color{red}{E[Y_{post}|A=1]} - \\ & \qquad (\color{green}{E[Y_{pre}|A=1]} + \color{darkorange}{E[Y_{post}|A=0]} - \color{brown}{E[Y_{pre}|A=0]}) \\ &= (\color{red}{E[Y_{post}|A=1]} - \color{green}{E[Y_{pre}|A=1]})- \\ & \qquad (\color{darkorange}{E[Y_{post}|A=0]}-\color{brown}{E[Y_{pre}|A=0]}) \end{align*}

which is precisely the diff-in-diff observed variables estimand. That is, to be able to write the counterfactual estimand as a contrast among observed quantities, we need the parallel trends assumption because it links the counterfactual quantities to the observed quantities. It is an essential assumption for diff-in-diff and the whole motivation behind the methodology. In theory it's a much more plausible assumption than strong ignorability or the exclusion restriction for instrumental variables, which is why diff-in-diff is such a powerful method.

Noah
  • 33,180
  • 3
  • 47
  • 105
  • Thank you Noah! – RobertF Mar 03 '22 at 16:58
  • Quick follow up question: Can I estimate *ATE* with DiD using the model $E(Y)=\beta_0+\beta_AA+\beta_PP+\beta_{AP}AP$, where $A$=Treatment, $P$=Time period (Pre=0, Post=1), then calculate $ATE=E(Y|A=1)-E(Y|A=0)=\beta_A+\beta_{AP}P$ without resorting to the parallel trends assumption? – RobertF Mar 06 '22 at 14:38
  • Or is this no longer DiD but just a run-of-the-mill regression? – RobertF Mar 06 '22 at 14:45
  • Comparing the post outcome means leaves you susceptible to bias due to confounding unless you have a randomized trial, and comparing post and pre in the treated group alone leaves you susceptible to maturation (i.e., changes in the outcome due to factors other than the treatment). Only the full DiD removes both biases/threats to validity. Only the ATT is identified; you cannot estimate the ATE using DiD. – Noah Mar 06 '22 at 18:04
  • It may no longer be DiD, but if you're comparing non-parallel trends across groups, and the control group post trend changes by x% (for external reasons unrelated to the study) and the treatment group post trend changes by (x + y)%, then isn't it reasonable to assume the treatment effect is y%? – RobertF Mar 16 '22 at 03:09
  • 1
    @RobertF Only under the parallel trends assumption! Without that assumption the treated group trend could simply also be due to non-treatment factors that happen to work differently on the treated group. For example, if the trend was due to covariates, and the distribution of covariates differed between the two groups, you could see different trends even if the treatment had zero effect. Remember the parallel trends assumption is about what would have happened to the treated group had it not been treated. – Noah Mar 16 '22 at 03:51
  • 1
    You're assuming that what would have happened had the treated group not received treatment is an x% change, you can attribute the additional y% change to just receiving treatment. So you are invoking the parallel trends assumption with the statement you made. It's entirely possible that in the absence of treatment, the treated group would have actually changed by (x + 4)%, meaning the actual effect of treatment is (y - 4)%. Without the parallel trends assumption, the treatment effect cannot be uniquely identified by the data. – Noah Mar 16 '22 at 03:54
  • Rereading your comments . . . this makes sense. If there are different pre-treatment outcome trends in the treatment and control groups, even after controlling for different covariate distributions, then we have a problem. The two groups are no longer comparable, something is different about the trmt group. What's interesting with DiD analysis is the trends have to be the same but it's OK if the intercepts for the trmt and control groups are different. Wouldn't that mean we're no longer comparing apples to apples? – RobertF Apr 12 '22 at 21:14
  • Pretreatment trends give a hint about the treatment trend, but remember the parallel trends assumption is fundamentally untestable. The beauty of DiD is that you don't need to compare apples to apples if the parallel trends assumption is met. It seems like you think the parallel trends assumption is not met, even after adjusting for covariates, which means you have a problem. Differing intercepts is okay because the parallel trends assumption allows you to subtract away the difference in the intercepts. – Noah Apr 12 '22 at 21:27
  • Well if the intercepts are different, then even if pre-treatment outcome trends are parallel there are still fundamental differences between the treatment & control groups, right? How can we be confident that the counterfactual treatment trend & the control group trend will continue to be parallel into the post-period if we have two different populations? Maybe after controlling for pre-treatment covariates this can be rectified. – RobertF Apr 12 '22 at 21:50
2

It is important to distinguish two parts:

  • The counterfactual parallel trend assumption (PTA) at the time of treatment T+1
  • The testable parallel trend assumption before the treatment t<T if you have more than one pre-treatment period.

What is necessary is to have the counterfactual PTA at T+1 to hold in order to make a causal claim about the Diff-diff, but this is untestable by definition. So usually people test for the testable PTA at t<T. The idea is that if the testable PTA t<T holds, this gives more support to the wishful and unverifiable claim that the counterfactual PTA T+1 holds.

So you are right, the testable PTA t<T is neither necessary nor sufficient for the counterfactual PTA T+1 to hold:

  • It is not necessary since despite not having PTA at t<T, PTA could still hold at T+1
  • It is not sufficient since despite having PTA at t< does not imply that PTA holds at T+1
Matifou
  • 3,083
  • (+1) Great points to recognize! – Noah Feb 09 '23 at 16:32
  • Thank you Matifou. I've been suggesting to my team at work we avoid change scores and DiD and instead model E[Y1] = Y0 + A. Tennant et al. (2022) has persuaded me there are problems with DiD on a number of levels. https://stats.stackexchange.com/questions/601149/is-difference-in-differences-dead – RobertF Feb 09 '23 at 18:07