0

I am having difficulties with wrapping my head around an idea. Can difference-in-differences be used to remove bias from the data? I found a mention here: "Difference in differences is a well-accepted method in quantitative research and we use it to correct pre-experiment bias between groups so as to produce reliable treatment effects estimation."

Let's assume I have two variables at user level, and I have information collected before and after the experiment. Can I use diff-in-diff as a preprocessing step and then test the effect caused by a treatment by using a regular t-test?

edit:

The idea is as follows. Before introducing the treatment, there are two groups A and B (each with 1000 observations). The properties of the groups can change with time (increasing/decreasing pattern). I introduce a new feature to group B (part of the experiment, it is the test group), A is left unchanged. This creates group A -> C and and B -> D. In the end I want to carry out a statistical test, but account for the fact that some changes happened due to trend.

As in the picture below:

enter image description here

edit2:

Looking at Dimitriy V. Masterov's answer here I see how to get the DiD coefficient (or simply value). He suggests that one of the possibilities is to carry out a hypothesis test to see if it is different from 0. But coming back to the original article I mentioned, I do not see how to use it as a pre-processing step and then later carry out different kind of hypothesis test (bootstrap/delta + t-test).

abu
  • 381
  • Difference between before and after the experiment is one difference. Where is another difference? – user158565 Dec 10 '18 at 16:36
  • I don't follow the question, could you please explain? – abu Dec 11 '18 at 08:33
  • difference in deference = (a-b)-(c-d). Which ones are your a,b,c,d? – user158565 Dec 11 '18 at 15:52
  • I updated the question accordingly. – abu Dec 13 '18 at 10:24
  • If you are ready to assume that the outcomes in groups A and B would have had a parallel path in the absence of the treatment (parallel trend assumption), then the diff-in-diff estimator will indeed deliver an unbiased measure of the causal effect of the treatment (on the treated). What I don't completely understand in your question is why you talk about "data pre-processing"? The DiD estimate is already a scalar (as a difference between means), it is not defined at the individual level. Could you please clarify? – Roland Dec 25 '18 at 16:10
  • @Roland I was looking at https://eng.uber.com/xp/ where they refer to this particular step (DiD) as data-preprocessing. I would like to understand the logic of their reasoning, so I can play around with testing something using similar methodology. – abu Jan 07 '19 at 09:17

0 Answers0