I am trying to use PSM for program evaluation. My data is at the individual level and I would like to do the PSM at the district level (match districts with each other rather than individuals).
Based on the literature I could find online, I have done the following:
- Calculate propensity score using score at the individual level.
pscore treatment districts selected_variables, pscore(mypscore), logit
- Used psmatch2 to do the matching
psmatch2 treatment, outcome(outcome_variable) pscore(mypscore) kernel
My question: I would like to understand if the above approach is correct or if I should collapse the data before using the psmatch2 command. In addition, would it be sufficient to simply run a linear regression to select the relevant variables to use in the pscore command?
Thank you!