Bonferroni correction: Whats exactly is meant by "multiple tests"?

Question

The Bonferroni correction seems to be quite controversial. But I read again and again that it should be used for multiple tests. But what exactly are multiple tests? If I have three different data sets in the same study and run only one t-test on each data set, is that a multiple test and do I have to apply a Bonferroni correction?

Or am I only talking about multiple testing if I have one data set and do three tests on the same data set?

I find the statements when it comes to the Bonferroni correction very unclear and would be grateful for your expertise.

Related https://stats.stackexchange.com/questions/206592/why-arent-multiple-hypothesis-corrections-applied-to-all-experiments-since-the — rep_ho, Apr 07 '23 at 18:05
Probably helpful too, although on a different level: https://xkcd.com/882/ — Camille Gontier, Apr 07 '23 at 19:53

score 1 · Answer 1 · answered Apr 07 '23 at 21:37

The answer for almost all questions that I see here regarding multiple comparison 'corrections' such as Bonferroni is that the desirability of their application depends on things that are usually not mentioned in the question! That means that any really accurate and balanced answer has to be very long. I will not make this long enough, but will point you to my best attempt long-form answer: A Reckless Guide to P-values : Local Evidence, Global Errors

What is the nature of your study and what are your inferential objectives? Is the study a preliminary one that might be though of a 'hypothesis generating' or is it intended to be a standalone 'definitive' account? You might be more interested in the evidential meaning of the data more than the long run error rate consequences of your statistical procedures.

The controversy that you mention might well be a consequence of people being unwilling to imagine that not every user of statistical approaches share their particular purposes and circumstances.

Are the null hypotheses of the several tests the same, or related, or independent? Are any of the data shared across tests?

'Corrections' for multiplicity always come at the cost of reduced power. In other words, they trade off type II errors for extra protection against a category of type I errors. Given your inferential objectives, is that trade-off going to render your designed balance of false positive and false negative errors undesirable? Did you design that balance with the 'correction' in mind? Did you design that balance at all, or are you relying on the arbitrary p<0.05?

Bonferroni correction: Whats exactly is meant by "multiple tests"?

1 Answers1