1

Is it possible to get a situation where, in a multiple-testing scenario, for some individual tests, the Benjamini&Yukatieli (2001)-correction ends up being more stringent than Bonferroni? I.e. that for some tests, $$ p_{\text{Bonferroni}} < p_{BY} $$

I seem to have encountered this with data and while I have no conceptual problem with it (the procedures are after all quite different), but would have intuitively said that with every correction any p-value can be at most as large as with Bonferroni.

  • related: https://stats.stackexchange.com/questions/59681/why-is-controlling-fdr-less-stringent-than-controlling-fwer/143852#143852 – Christoph Hanck Mar 09 '22 at 14:59
  • @ChristophHanck The statement that any control of FWER will trivially also control FDR would also support the idea that any FDR-corrected p-value will be at most as large as its FWER-corrected cousin. But again, it doesn't offer a direct proof that the reverse can't occur for singular tests in a multiple testing situation – Marc Vaisband Mar 10 '22 at 10:54
  • OK, this might be more relevant then: https://stats.stackexchange.com/questions/238458/whats-the-formula-for-the-benjamini-hochberg-adjusted-p-value – Christoph Hanck Mar 12 '22 at 15:38

1 Answers1

1

It turns out that indeed, this can and does trivially happen, as tested on two different implementations in Python, from statsmodels and pingouin, with only two tests and p-values of 0.01 and 0.1.

from pingouin import multicomp as pingouin_multicomp
from statsmodels.stats.multitest import multipletests as statsmodels_multicomp
pvals = [0.01, 0.1]

With this,

pingouin_multicomp(pvals, method="bonf")[1] -> array([0.02, 0.2 ])
pingouin_multicomp(pvals, method="fdr_by")[1] -> array([0.03, 0.15])
statsmodels_multicomp(pvals, method="fdr_by")[1] -> array([0.03, 0.15])

Unless there is some deep conceptual mistake I made, it would appear that the smaller value gets corrected more.

Interestingly, this does not seem to happen with the original Benjamini & Hochberg (1995) procedure (as suggested by the method outlined in the post linked by Christoph Hanck). If anyone wants to outline the difference, I'll shift the accept; for now this is at least what seems to be confirmation that I'm not crazy.