Using difference-in-differences as data preprocessing

Question

I am having difficulties with wrapping my head around an idea. Can difference-in-differences be used to remove bias from the data? I found a mention here: "Difference in differences is a well-accepted method in quantitative research and we use it to correct pre-experiment bias between groups so as to produce reliable treatment effects estimation."

Let's assume I have two variables at user level, and I have information collected before and after the experiment. Can I use diff-in-diff as a preprocessing step and then test the effect caused by a treatment by using a regular t-test?

edit:

The idea is as follows. Before introducing the treatment, there are two groups A and B (each with 1000 observations). The properties of the groups can change with time (increasing/decreasing pattern). I introduce a new feature to group B (part of the experiment, it is the test group), A is left unchanged. This creates group A -> C and and B -> D. In the end I want to carry out a statistical test, but account for the fact that some changes happened due to trend.

As in the picture below:

edit2:

Looking at Dimitriy V. Masterov's answer here I see how to get the DiD coefficient (or simply value). He suggests that one of the possibilities is to carry out a hypothesis test to see if it is different from 0. But coming back to the original article I mentioned, I do not see how to use it as a pre-processing step and then later carry out different kind of hypothesis test (bootstrap/delta + t-test).

Difference between before and after the experiment is one difference. Where is another difference? — user158565, Dec 10 '18 at 16:36
difference in deference = (a-b)-(c-d). Which ones are your a,b,c,d? — user158565, Dec 11 '18 at 15:52
If you are ready to assume that the outcomes in groups A and B would have had a parallel path in the absence of the treatment (parallel trend assumption), then the diff-in-diff estimator will indeed deliver an unbiased measure of the causal effect of the treatment (on the treated). What I don't completely understand in your question is why you talk about "data pre-processing"? The DiD estimate is already a scalar (as a difference between means), it is not defined at the individual level. Could you please clarify? — Roland, Dec 25 '18 at 16:10
@Roland I was looking at https://eng.uber.com/xp/ where they refer to this particular step (DiD) as data-preprocessing. I would like to understand the logic of their reasoning, so I can play around with testing something using similar methodology. — abu, Jan 07 '19 at 09:17

Using difference-in-differences as data preprocessing

0 Answers0