Propensity Score Matching at the district level and selecting variables

Question

I am trying to use PSM for program evaluation. My data is at the individual level and I would like to do the PSM at the district level (match districts with each other rather than individuals).

Based on the literature I could find online, I have done the following:

Calculate propensity score using score at the individual level.

pscore treatment districts selected_variables, pscore(mypscore), logit

Used psmatch2 to do the matching

psmatch2 treatment, outcome(outcome_variable) pscore(mypscore) kernel

My question: I would like to understand if the above approach is correct or if I should collapse the data before using the psmatch2 command. In addition, would it be sufficient to simply run a linear regression to select the relevant variables to use in the pscore command?

Thank you!

Harvard's Gary King has some great thoughts about this, particularly appropriate for poli sci program evaluation Why Propensity Scores Should Not Be Used for Matching https://gking.harvard.edu/publications/why-propensity-scores-should-not-be-used-formatching — user78229, Jun 20 '23 at 11:58
@MikeHunter It is not helpful to reference that article in opposition to the use of PSM. See my answer here for why. — Noah, Jun 20 '23 at 19:10
@noah nice job....insofar as my comment stimulated a link to your post, it was helpful, no? — user78229, Jun 20 '23 at 20:00
@MikeHunter Thank you for sharing the paper. It is informative. — mridhula, Jun 21 '23 at 11:13

score 1 · Accepted Answer · answered Jun 20 '23 at 19:23

Instead of thinking about what commands to run, think about what you are trying to do. If your goal is to find districts that are similar to each other, why are you estimating propensity scores for each unit and why are you doing matching on the unit level? It seems like you want to perform the analysis at the district level, in which case you need to find districts that are similar to each other and discard the rest. It may be that there is no quick and simple Stata command to do this.

That said, this isn't the best way to estimate effects in a cluster observational study. The pitfalls and an alternative are described in Zubizarreta & Keele (2017), with an updated simpler method described in Pimentel et al. (2018). The method used by Pimentel et al. is implemented in the R package matchMulti, which has good documentation and appears easy to use.

Pimentel, S. D., Page, L. C., Lenard, M., & Keele, L. (2018). Optimal multilevel matching using network flows: An application to a summer reading intervention. The Annals of Applied Statistics, 12(3), 1479–1505. https://doi.org/10.1214/17-AOAS1118

Zubizarreta, J. R., & Keele, L. (2017). Optimal Multilevel Matching in Clustered Observational Studies: A Case Study of the Effectiveness of Private Schools Under a Large-Scale Voucher System. Journal of the American Statistical Association, 112(518), 547–560. https://doi.org/10.1080/01621459.2016.1240683

Thank you. I agree that PSM may not be suitable for my research. I'm trying to implement the matchMulti package, but I'm encountering implementation issues. I have a large dataset (2000,000 obs and 100 vars) and R is facing memory allocation errors. Further, there also seems to be an issue with how I have specified my treatment variable. If I can't resolve these, I'm considering the 'within approach' mentioned here by Bruno Arpino link Do you think this approach would be sensible? Thank you! — mridhula, Jun 21 '23 at 11:06
That approach is not sensible because it assumes there is variation in treatment within each cluster. If you are matching at the district level, that suggests your treatment is at the district level. You can't match within districts if everyone with a district has the same treatment status. — Noah, Jun 21 '23 at 18:46

Propensity Score Matching at the district level and selecting variables

1 Answers1