I have thousands of evaluation scores, over several sites, which vary in number of evaluations for each. I am tasked with comparing the evaluation means for each site to the all-site mean. The approach previous has been to use a student t-test on each site, with the all-site mean as the test value. The all-site mean is a simple unweighted mean.
Is this a reasonable method, given that each site is included in the all-sites mean, and that each site contributes differing numbers of evaluations to that mean?
If this is not a useful approach, what would a better one look like? Would the application of weighting on the all-site mean be an improvement?