2

I am trying to recover the true (simulated) effect of a treatment Z on an outcome Y, which is set to ATE = 5 (the csv file for the data is located here: https://www.dropbox.com/s/92obn9hsu3tqy92/synthetic_data_2.csv?dl=0). I am able to recover the true effect using a linear model, however, for some reason, I am unable to get the same effect using MatchIt (or Opmatch). As the main confounder (variable name “C_p”) is a binary variable, I have tried converting it to numeric, integer, and factor, but the same problem persists. I have also tried “cem” method and “nearest” but no progress.

After suspecting that something is convoluted in the original simulated file, I simulated some new data (see below). Using these data, I am recovering the true effect using lm. With matching, the effect is closer to the truth compared to the original problem, but still biased. Using a t.test, we see that the ATE is -4.15 – (-2.55)= -1.6, yet it should be equal to 5.

Any ideas of why matching is not recovering the true effect of synthetic_data_2.csv, using matching?

setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
setwd("../../")

library(MatchIt) library(dplyr)

#setwd("path/to/dir") imf.meta <- read.csv("synthetic_data_2.csv", sep= ",")

imf.meta$Z <- imf.meta$'T' #imf.meta$C_p <- as.numeric(imf.meta$C_p) AD: tried numeric, integer, and factor

Executing a matching algorithm

imf.meta_nomiss <- imf.meta %>% select(C1, C2, C3, Cp, Y, Z) %>% na.omit()

tried different approaches

mod_match <- matchit(Z ~ C1 + C2 + C3 + Cp, method = "nearest", data = imf.meta_nomiss)

mod_match <- matchit(Z ~ C1 + C2 + C3 + C_p,

method = "cem", data = imf.meta_nomiss)

mod_match <- matchit(Z ~ C_p,

method = "nearest", data = imf.meta_nomiss)

mod_match <- matchit(Z ~ C_p,

method = "cem", data = imf.meta_nomiss)

dta_m <- match.data(mod_match)

Estimating treatment effects

with(dta_m, t.test(Y ~ Z))

recover treatment effect withouth additional adjusting

lm_treat1 <- lm(Y ~ Z, data = dta_m) summary(lm_treat1)

recover treatment effect with adjusting

lm_treat2 <- lm(Y ~ Z + C1 + C2 + C3 + C_p, data = dta_m) summary(lm_treat2)

Simulate new data

all covariates are continous

n <- 2000 p <- 10 X <- matrix(rnorm(n * p), n, p) W <- rbinom(n, 1, 0.4 + 0.2 * (X[, 1] > 0)) Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)

create binary covariate

n <- 2000 p <- 10 X <- matrix(rnorm(n * p), n, p) X[,1] <- rbinom(n, 1, 0.6) W <- rbinom(n, 1, 0.1+0.7 * (X[, 1] > 0.5)) #Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n) Y <- pmax(X[, 1], 0)(-10) + 5W + rnorm(n)

na.omit()

imf.meta_nomiss <- as.data.frame(X) imf.meta_nomiss$Y <- Y imf.meta_nomiss$W <- W

compare with grf

library(grf) tau.forest <- causal_forest(X, Y, W) average_treatment_effect(tau.forest, target.sample = "all")

compare with lm

lm_treat1 <- lm(Y ~ W + V1+V2+V3+V4+V5+V6+V7+V8+V9+V10, data = imf.meta_nomiss) summary(lm_treat1)

use matching

mod_match <- matchit(W ~ V1+V2+V3+V4+V5+V6+V7+V8+V9+V10, method = "nearest", data = imf.meta_nomiss) dta_m <- match.data(mod_match)

Estimating treatment effects

with(dta_m, t.test(Y ~ W))

AD with and without mweights is not making a difference

without weights

lm_treat3 <- lm(Y ~ W, data = dta_m) summary(lm_treat3)

with weigths

lm_treat4 <- lm(Y ~ W, data = dta_m, weights=weights ) summary(lm_treat4)

Adel
  • 134

1 Answers1

3

1:1 propensity score matching without replacement is not generally unbiased; it works only when there are many more control units than treated units and when there is good overlap between the treatment groups. There are a few things that could possibly be going on here that I'll attempt to address.

Let's consider your analysis of the Dropbox CSV file. Starting with nearest neighbor matching on the propensity score:

> mod_match <- matchit(Z ~ C1 + C2 + C3 + Cp,
+                      method = "nearest", data = imf.meta_nomiss)
> summary(mod_match, improvement = FALSE)

Call: matchit(formula = Z ~ C1 + C2 + C3 + Cp, data = imf.meta_nomiss, method = "nearest")

Summary of Balance for All Data: Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max distance 0.7011 0.2248 1.8304 1.2078 0.3973 0.6309 C1 105.1623 94.4411 0.2145 1.0402 0.0620 0.0961 C2 612.5812 409.4929 1.1874 0.9812 0.2965 0.4356 C3 -200.4626 -200.5272 0.0013 0.9718 0.0063 0.0176 Cp 0.7255 0.3189 0.9112 . 0.4066 0.4066

Summary of Balance for Matched Data: Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist. distance 0.7011 0.2934 1.5668 1.2203 0.3127 0.5714 1.5668 C1 105.1623 98.1299 0.1407 1.0524 0.0411 0.0710 1.1167 C2 612.5812 464.6477 0.8649 1.2906 0.2179 0.3556 1.1090 C3 -200.4626 -200.9724 0.0103 0.9481 0.0093 0.0248 1.1554 Cp 0.7255 0.4006 0.7281 . 0.3249 0.3249 1.0761

Sample Sizes: Control Treated All 3424 2576 Matched 2576 2576 Unmatched 848 0 Discarded 0 0

We can notice a few things here. First, the data are very imbalanced. There are large standardized mean differences and KS statistics (eCDF max) for C2 and Cp. Let's take a look at the distribution of C2 before and after matching:

enter image description here

There are areas of non-overlap in the treated distribution, which means nearest neighbor matching without a caliper will fundamentally be unable to achieve balance, yielding bias in the effect estimate. Looking at this plot, there is no hope in accurately estimating the ATT using matching without a caliper. Supplying a caliper (e.g., caliper = .05) restricts the analysis to the area of overlap and allows us to achieve balance and accurately recover the treatment effect (the treatment effect seems to be constant so the average treatment effect in the caliper-matched sample is equal to the ATE and ATT, though this is rarely true in real data).

> #Caliper matching
> mod_match <- matchit(Z ~ C1 + C2 + C3 + Cp,
+                      method = "nearest", data = imf.meta_nomiss, caliper = .05)
> summary(mod_match, un = FALSE)

Call: matchit(formula = Z ~ C1 + C2 + C3 + Cp, data = imf.meta_nomiss, method = "nearest", caliper = 0.05)

Summary of Balance for Matched Data: Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist. distance 0.4874 0.4785 0.0344 1.0574 0.0065 0.0317 0.0345 C1 100.2388 101.6385 -0.0280 0.9981 0.0076 0.0246 1.1019 C2 521.3383 520.3469 0.0058 1.0987 0.0045 0.0167 0.8306 C3 -200.0333 -201.4824 0.0293 0.9590 0.0112 0.0281 1.1515 Cp 0.5805 0.5558 0.0552 . 0.0246 0.0246 0.8712

Sample Sizes: Control Treated All 3424 2576 Matched 1137 1137 Unmatched 2287 1439 Discarded 0 0

> lm_treat1 <- lm(Y ~ Z, data = dta_m, weights = weights) > lmtest::coeftest(lm_treat1, vcov. = sandwich::vcovCL, cluster = ~subclass)

t test of coefficients:

        Estimate Std. Error  t value Pr(&gt;|t|)    

(Intercept) -55.8991 1.7003 -32.8756 <2e-16 *** Z 3.2060 2.3028 1.3922 0.164


Why didn't CEM provide the right answer? CEM, like caliper matching, restricts the analysis to an area of common support, so it should recover the effect correctly. You analyzed the matched data incorrectly. The MatchIt vignette on estimating effects explains how to estimate effects after CEM; you must include the weights in the effect estimation. When we do this, we find an estimate with a confidence interval that contains 5, indicating that CEM is successful.

> lm_treat1 <- lm(Y ~ Z, data = dta_m, weights = weights)
> lmtest::coeftest(lm_treat1, vcov. = sandwich::vcovCL, cluster = ~subclass)

t test of coefficients:

        Estimate Std. Error  t value Pr(&gt;|t|)    

(Intercept) -62.3837 3.0945 -20.1594 <2e-16 *** Z 1.8940 2.9900 0.6335 0.5265


Let's move on to the analysis of the data you simulated with the code in your question. Examining the balance of the covariates before matching, we see that the covariates are mostly balanced except for V1, which has an extreme imbalance. But more important is that the numbers of treated and control units are very close together, with a few more treated units than control units. This is not a situation in which nearest neighbor matching without replacement will succeed. Barely any matching is taking place; the matched sample is almost identical to the unmatched sample. If you instead use matching with replacement (i.e., replace = TRUE), balance is achieved and the effect estimate is correct. (To keep this answer from getting too long I'll let this be an exercise for the reader.)

I think your attempts to analyze these data suffered from a few problems: 1) You applied a blunt instrument instead of using the method most appropriate for the data, 2) You did not assess balance to see if your method was working, and 3) You analyzed the data incorrectly. Matching is a method that requires care and nuance; its strength is that you can tailor the method to incorporate substantive information about the subject matter and build trust in the reader that you have eliminated all bias due to confounding by the measured covariates. In order to realize these benefits, though, you have to do the matching in line with best practices as described in the MatchIt vignettes. It is critical to use the matching method most appropriate for the data at hand (described in the "Matching Methods" MatchIt vignette), ensure you have achieved balance after matching (described in the "Assessing Balance" MatchIt vignette), and estimated the effect correctly (described in the "Estimating Effects" MatchIt vignette). Hopefully taking a closer look at these documents will provide some clarity.

Noah
  • 33,180
  • 3
  • 47
  • 105
  • many thanks for your thoughtful reply. While I assumed I knew matching well, this problem and your reply made me realize how much of an art matching is rather than an exact science. For example, although, I tried many different calipers (e.g., 1 standard deviation and below), I never went below 0.1 and concluded that something is strange with my data. I also combined “replacement” but that obviously did not work propensity score matching if the caliper is rough. Yet replacement is key for CEM in this problem. – Adel Aug 27 '21 at 18:35
  • Two small follow ups, (1) in the summary output "summary(mod_match, improvement = FALSE)", what is the column "distance" stand for? Is it the average distance but should th at not be the same for control? – Adel Aug 27 '21 at 18:39
  • And (2) I am trying to replicate the problem in optmatch, like the following, but optimal matching has different mechanics.: ps.mod = glm(Z ~ C1 + C2 + C3 + C_p, data=imf.meta, family="binomial") ps.dist = optmatch::match_on(ps.mod, data = imf.meta)

    ps.match = fullmatch(caliper(ps.dist, width=0.001*sd(ps.dist@.Data)), data= imf.meta, remove.unmatchables=T, max.controls=1, tol = 0.005)

    – Adel Aug 27 '21 at 18:40
  • 1
    distance is the propensity score. You can do the same thing using MatchIt with method = "full" as you can using fullmatch if you prefer MatchIt's syntax. The tough part about optmatch is that extracting meaningful matching weights is not straightforward, which is why MatchIt does it for you. Remember you must incorporate those weights to get valid estimates. – Noah Aug 27 '21 at 22:29