Assume that the researcher wants to test huge amount of hypothesis and present his discoveries. He has data from two groups, eg, 100 people vs 100 people, and he wants to
1) having 20.000 genes expression measured from each person, figure out how many of genes are differentially expressed
2) having 1.000 different metabolites, figure out how many of them show different levels
3) having 100 physiological characteristics, such as height/weight/body temperature/etc, figure out which are different between groups
So in the end a person performs 21.100 tests. My question is: what is a correct way to deal with the resulting p-values - should I apply FDR correction on each set of tests 1)-3) separately, or all together? If I apply separately, I will have much bigger power in case 3) than in case 1). If I apply jointly, I will have a really low power for any case. If separate FDR control is possible - why can not we make it absurd and divide our 21.100 tests into groups of size 10 or even 1? What if I have different cohorts of 100 vs 100 people for each situation 1),2),3) - but I want to present this results in one paper? The answer seems to be clear when the groups do not have positively dependent p-values - so mixing of them may ruin typical FDR control assumptions - but what if p-values are eg independent?
I feel that it has something with Simpson's paradox, but can not understand it...
UPD: I found this question already asked, even several times, on stats.stackexchange - (so not only me who cares about that and there is a huge off-line confusion with this analysis among my colleagues who are not statisticians at all), but there is no answer
UPD1: The obvious answer may be formulated as "it depends on which proportion of false discoveries you want to guarantee - inside each group or overall". I guess when researchers publish a paper, they are mainly interested in the overall percentage of false claims throughout the paper.
UPD2: Could you as statisticians make kind of guideline on reporting results in this sense, which accents should be pointed while reporting bunch of statistically significant results devoted to different phenomena within one paper? My knowledge is enough to give only the answer from UPD1...
UPD3: if you downvote the question, please, elaborate in comments why you think it deserves to be closed and why there is nothing to discuss. I would be glad to know that the answer is obvious, point me (and a lot of people who are also confused) at this answer.