Can I get simple Z-test results from PSM output?

Question

I'm having a little trouble interpreting the data after a PSM match. I used full matching with a logit link, no caliper. I am testing my treatment "buyout" 's effect on the binary outcome "dest_flood". What I want to report is a z-test for the adjusted sample with inverse probability weights. Using this guide from the MatchIt vignette, I prepared my code this way.

modelA <- glm(dest_flood ~ buyout_flag, data = full_data, weights = weights, family = quasibinomial(link = "logit"))
round(coeftest(modelA, vcov. = vcovCL, cluster = ~subclass)[1:2,], digits = 3)
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -2.214      0.285   -7.75    0.000
buyout_flag   -0.926      0.437   -2.12    0.034
sandA <- coeftest(modelA, vcov. = vcovCL, cluster = ~subclass)
sandA
z test of coefficients:
        Estimate Std. Error z value           Pr(&gt;|z|)    

(Intercept)   -2.214      0.285   -7.75 0.0000000000000089 ***
buyout_flag   -0.926      0.437   -2.12              0.034 *

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef(modelA)) #OR
(Intercept) buyout_flag 
      0.109       0.396

It seems like the step most of the way through when I print "sandA" output and the header is z test of coefficients should be it. But when I compare it to a prop test of the same values, they are so different that I'm not sure how I should interpret them:

table(full_data$dest_flood, full_data$buyout_flag)
  0   1
  0 852 254
  1  80  11
prop.test(x=c(11, 80), n = c(254 , 852), p = NULL, alternative = "two.sided",
           correct = TRUE)
> prop.test(x=c(11, 80), n = c(265 , 932), p = NULL, alternative = "two.sided",

      correct = TRUE)


2-sample test for equality of proportions with continuity correction


data:  c(11, 80) out of c(265, 932)
X-squared = 5.1579, df = 1, p-value = 0.02314
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.07675366 -0.01190129
sample estimates:
    prop 1     prop 2 
0.04150943 0.08583691

Comparatively, if I use the same process (more or less) for continuous variables I get, the weighted etc post-PSM coeftest matches up fairly well with a basic t test:

model1 <- lm(percent_poverty_dest ~ buyout_flag, data = full_data, weights = weights)
round(coeftest(model1, vcov. = vcovCL, cluster = ~subclass)[1:2,], digits = 3)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    12.65      0.876  14.444    0.000
buyout_flag    -0.79      1.126  -0.702    0.483
sand1 <- coeftest(model1, vcov. = vcovCL, cluster = ~subclass)
sand1
t test of coefficients:
        Estimate Std. Error t value            Pr(&gt;|t|)    

(Intercept)   12.649      0.876    14.4 <0.0000000000000002 ***
buyout_flag   -0.790      1.126    -0.7                0.48

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
buyout <- subset(full_data, buyout_flag == 1)
prox <- subset(full_data, buyout_flag == 0)
t.test(buyout$percent_poverty_dest , prox$percent_poverty_dest )
Welch Two Sample t-test


data:  buyout$percent_poverty_dest and prox$percent_poverty_dest
t = -3, df = 456, p-value = 0.01
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.016 -0.392
sample estimates:
mean of x mean of y 
     11.9      13.6

How should I interpret this? And can I get z-test results for binary outcomes after a PSM match?

Noah · Answer 1 · 2022-06-23T06:31:17.727

2

You didn't incorporate the weights in the test for proportions or t-test, so for those tests, it's as if you didn't do the matching at all. This is the same error committed in this post. You MUST include the weights in the estimation of the treatment effect, and you must adjust for the weights in the estimation of its standard error (e.g., using robust standard errors). It's not straightforward to include weights in prop.test() or t.test(), which is why the MatchIt vignette recommends using regression with cluster-robust standard errors.

If you were doing 1-to-1 matching, then you could use prop.test() or t.test() since no weights are required to estimate the effect. (To understand why full matching works differently from other methods, see my answer here.) Neither of those account for the pairing, though. You can instead use McNemar's test or a paired samples t-test. There is no point in doing those, though, when the regression method gives you the right answer and is easy to implement using the same syntax regardless of the matching method (in most cases).

By the way, your prop.test() code is wrong; the n argument is supposed to contain the total number of trials in each group, not the number of failures as you entered it. Also, in your t.test() code, you may not be understanding the output correctly if you think the two methods give similar answers; the p-value for the treatment effect using regression is .483, and using the t-test is .01.

edited Jun 23 '22 at 06:31

answered Jun 23 '22 at 06:26

Noah

33,180
3
47
105

Hi Noah, thanks for the comments - I fixed the t.test code my mistake. Following up to make sure I am getting this right. For modelA (binary outcome) by exponentiating the coefficient into OR, the resulting value of 0.396 indicates that the treatment participants are ~60% less likely to have dest_flood = 1 than the control. For model1 (continuous outcome), the est coefficient of -0.790 indicates that the treatment group had .79 lower percent_poverty_dest than the control group (although it's non significant). Is that right? Or should I state it as the difference in means between the groups? – tchoup Jun 23 '22 at 12:01
"resulting value of 0.396 indicates that the treatment participants are ~60% less likely to have dest_flood = 1 than the control" That is not the correct interpretation of an odds ratio; you interpreted it as a risk ratio. Odds ratio are hard to interpret, so if you actually want a risk ratio, just set link = "log" in the glm() call. For model1, I don't know what the outcome is so I can't recommend a proper interpretation, but -.79 does represent a difference in means. – Noah Jun 23 '22 at 14:19
Just want to confirm that if I use link = "log", I still need to use exp(coef(modelA)) to get the RR, is that correct? – tchoup Jun 23 '22 at 15:01
Yes. The coefficients are log(RR)s, so you need to exponentiate them to get RRs. – Noah Jun 23 '22 at 15:18
Thank you so much! – tchoup Jun 23 '22 at 15:31
If my answer solved your problem, consider upvoting and marking the questions as answered :) – Noah Jun 23 '22 at 20:56

Can I get simple Z-test results from PSM output?

1 Answers1