0

I am looking at how different treatments (3 different kinds of protection of an area, and no protection which is the control group) affect my outcome of interest (deforestation) - in particular, I would like to know which treatment amongst the 3 gives better results. I decided to do matching to improve the balance of my covariates, using Propensity Scores. I saw this post (Matching with Multiple Treatments), but as I have millions of observations in the control group, and ~ few hundred thousand in each treatment group, matching on the control did not seem very feasible... As such, I split my dataset into 3 - treatment A and control; treatment B and control; treatment C and control. The 3 treatments are mutually exclusive, but the control observations are the same across the 3 datasets. Having done the Propensity Score Matching, I now have 3 matched datasets with improved balance.

Following Ho et al (2007) and Stuart and Rubin (2011) advice about combining matching with regression on the matched data, I'm looking to do the regression analyses now. My question is if I can combine these 3 matched datasets to do one regression analysis or if I have to run 3 separate regression analyses for each of the treatment types? In which case I would not be able to infer if one treatment performs better than another?

  • 1
    Hope I did not get you wrong. So you have 4 groups of samples now, matched for propensity, A,B,C and control. You can fit a regression model, all four groups with the control group as reference right? The coefficients will give you the effect of A, B or C with respect to the control – StupidWolf Jun 19 '20 at 11:45
  • Yes, I am able to fit a regression model with the 3 treatment types and control, but I was wondering if I should... I have not come across examples in the literature where the authors combined matched samples, and I'm not sure if that's because each matching procedure improves balance for that treatment and control, and by combining the 3 matched samples, that negates having done the matching?? – Jocelyne Sze Jun 19 '20 at 15:46

1 Answers1

0

You can append the matched units from the three treatment groups to form a single dataset and then run a single regression of the outcome on the treatment (and optionally covariates). This might look like the following:

full.data <- rbind(matchedA.data,
                   matchedB.data[treat == "B",],
                   matchedC.data[treat == "C",])

fit <- lm(outcome ~ treat + X1 + X2, data = full.data, weights = full.data$weights)

Each matched dataset presumably contains the control units, so you need to make sure they are included only once. The fact that you matched the datasets separately is statistically irrelevant.

Noah
  • 33,180
  • 3
  • 47
  • 105
  • That's great, thanks! Would I have to add weights even if I did matching without replacement? – Jocelyne Sze Jun 22 '20 at 08:38
  • If you did 1:1 matching without replacement, then no. Including them won't harm anything and make the procedure more generalizable to other forms of matching. – Noah Jun 22 '20 at 18:18