We teach supplementary lessons in nearly two dozen local schools, and have two data sets of approximately four hundred records each from pre-post tests given at these schools. Each record contains pre and post values (correct, incorrect) for questions on 12 topics as well as whether or not there was an intervention (lesson taught) relating to that topic. Our goal is of course to assess the impact of the lessons taught.
The pre and post test responses are pair-matched by student, and clustered by school since the lessons taught vary by school, in addition to the other factors generally accepted as valid for clustering at this level i.e; similar socioeconomic background, school culture, common instructors, etc.
Without clustering, McNemar is the accepted test for analysis of this data, and several authors have explored various modifications to McNemar to allow for clustering including:
- Methods for the Analysis of Pair-Matched Binary Data from School-Based Intervention Studies. Vaughan & Begg. 1999. doi: 10.3102/10769986024004367
- Analysis of clustered matched-pair data. Durkalski et al. 2003. doi: 10.1002/sim.1438
- Methods for the Statistical Analysis of Binary Data in Split-Cluster Designs. Donner, Klar, Zou. 2004. doi: 10.1111/j.0006-341X.2004.00247.x
- Adjustment to the McNemar’s Test for the Analysis of Clustered Matched-Pair Data. McCarthy. 2007. http://biostats.bepress.com/cobra/ps/art29/ (Free to download)
I have subsequently experimented with the Durkalski method as documented in McCarthy, since it seems to be deemed rather robust, as well as being the simplest for me to understand and code. However, none of the documented methods fit our case exactly as they use the matched pairs for pre-post or control-treatment only, and treat the clusters as a single class. We actually have matched pairs of pre-post in multiple control & treatment clusters, but this latter level of information is not used and discarding the ~50% of our data points from the control groups seems sub-optimal. Is anyone aware of a technique designed to analyze this data configuration, or someone whom might be interested in exploring this area?
Thanks in advance!