We've tested a cohort of wildtype and mutant mice on many different behavioural tests. 5 different tests, performed on different days ON THE SAME SET OF MICE, assess anxiety.
While bonferonni or some other multiple test correction would "penalise" the repeated testing, shifting our alpha to ~alpha/5, I think that by combining the results somehow should increase confidence.
Fisher's method of combined p-values (as far as I understand) cannot be applied because these are not independent: the same mice a repeatedly tested. What stats test could I run, or method would be recommended, to combine the results of the 5 (non-idependent) tests for an overall assessment of significant difference? Because the tests are different, e.g. in length and character, we can't pool raw data, but could use z-scores in this helped?
In these types of experiments, we typically might see 2,3,or 4 of the as significant to p<0.05 (w/o multiple test correction), but due to low n, the p-values are not staggering each on their own.