How to combine results from non-independent experiments with same null hypothesis (e.g. mutant mice don't have anxiety)

Question

We've tested a cohort of wildtype and mutant mice on many different behavioural tests. 5 different tests, performed on different days ON THE SAME SET OF MICE, assess anxiety.

While bonferonni or some other multiple test correction would "penalise" the repeated testing, shifting our alpha to ~alpha/5, I think that by combining the results somehow should increase confidence.

Fisher's method of combined p-values (as far as I understand) cannot be applied because these are not independent: the same mice a repeatedly tested. What stats test could I run, or method would be recommended, to combine the results of the 5 (non-idependent) tests for an overall assessment of significant difference? Because the tests are different, e.g. in length and character, we can't pool raw data, but could use z-scores in this helped?

In these types of experiments, we typically might see 2,3,or 4 of the as significant to p<0.05 (w/o multiple test correction), but due to low n, the p-values are not staggering each on their own.

If I understand correctly, you have a single outcome variable (anxiety) that is measured on 5 occasions (thus repeated measures), after administering a test immediately beforehand. The interesting thing here is that the tests are all different. I can see quite a few issues, not least "carry-over" effects from one test to the next. Is this a fair description of the study? If you can add some more details that would be great. Certainly it would be preferable to combine the data, which will enhance statistical power. This can be done, with, for example, a mixed effects model which can..../[cont] — Robert Long, Dec 14 '23 at 20:14
... account for the non-independence, but at the moment I'm not sure if it's the right way to go. — Robert Long, Dec 14 '23 at 20:14
We had 15 wildtype mice and 15 genetically altered (knockout) mice, so no treatment, just inherent behavioural differences. — SangerSeb, Dec 14 '23 at 20:23
One test is time spent in middle of arena vs. edge of arena. The second is time spent in light vs time spend in dark areas. 3 more related tests like this. Each done on a different day, but same animals. — SangerSeb, Dec 14 '23 at 20:24

EdM · Answer 1 · 2023-12-14T21:50:33.723

If you have complete data on all mice in all conditions and all of your outcome measures are continuous, you could consider a multivariate linear model, in the sense of multiple outcomes. That's explained in a freely available Appendix to the Fox and Weisberg text An R Companion to Applied Regression, and is easily performed with the standard R lm() function for linear modeling. That method takes into account the within-animal correlations among the different types of outcomes, while providing overall significance tests that take correlations among types of outcomes into account. The first example in that Appendix is very close to your situation, with 4 different continuous measurements from individual plants evaluated as a function of 3 plant species.

An alternative would be to devise a single "anxiety" scale that combines the results from the 5 individual tests for each mouse, then evaluate that overall scale as a function of genotype. If you take that approach, you should either follow standard procedures in your field for combining results of the individual tests, or develop your own combination scale by applying your understanding of the subject matter without looking at the genotype differences.

You should make sure, however, that the potential problem of "carry-over" from test to test, noted by Robert Long in a comment on the question, was handled appropriately in the experimental design.

How to combine results from non-independent experiments with same null hypothesis (e.g. mutant mice don't have anxiety)

1 Answers1