0

Suppose there is a data set $D$. There are 20 groups of researchers independently. Each group will publish a few results on this data set $D$. Suppose each of them find about 3 tests significant($5\%$ significance level) and concluded significance of their results. Assume tests are non-overlap and there is possible causal relationships between results. Assume each group has similar research quality.(This assumption surely breaks down in reality.)

If we aggregate the result, there should be 60 hypothesis testing done whereas individually there are only 3 done. On $5\%$ significance level, there would be 3 expected type I error after aggregation. The type II error would be inflated as well. Individuals may see a few perspectives but aggregation has a lot.

How much trust should one place upon multiple groups studying the same data set here? Of course, we can adjust FDR or adjust p-values correspondingly.

Should one keep first 6 published papers and toss the other 14 or should one randomly keep 6 papers and toss out the other 14?

user45765
  • 1,416
  • Can you explain why one should care about keeping the family-wise error rate for tests performed on the same dataset low? If each research group uses a totally different set of variables, on a totally different set of topics and research questions, then the family-wise error rate for the full dataset seems very uninteresting to me. Otherwise, if we aggregate all datasets in the world into one single dataset, we should then toss out the majority of published research results in order to keep the family-wise Type-I error rate of all tests ever performed on all data in the world low? – Marjolein Fokkema Dec 04 '21 at 17:32
  • @MarjoleinFokkema Different groups will use derived set of data coming out of $D$ for different purpose. However the data is generated under the same sampling scheme. The overall procedure is basically integrate out the variables that will not be present in particular study. So everything is under the same distribution profile for the data generation and that is the distribution you integrate over to get counting of expected type I error. If different groups use different data sets, I will have no objection. – user45765 Dec 04 '21 at 17:37
  • @MarjoleinFokkema My concern is the following. Instead of 20 group researcher, we have 100 groups. The expected number of type I error will be 15. There is possibility that one group will get everything wrong. – user45765 Dec 04 '21 at 17:39
  • @MarjoleinFokkema If I denote particular r.v X interested in testing, you would compute $E[I_{p(X>x)>\alpha}]$ for single sided testing with $\alpha$ level and this gives expected number of type I error where $I_S$ is indicator function over set $S$. – user45765 Dec 04 '21 at 17:45
  • Sorry, I am not sure what your are asking. I was wondering why you to care distinguish between the same dataset being used, or not. Because it will not affect the expected number of Type-I errors, nor the probability of one group getting everything wrong, I believe. – Marjolein Fokkema Dec 04 '21 at 20:51
  • @MarjoleinFokkema " Because it will not affect the expected number of Type-I errors, nor the probability of one group getting everything wrong, I believe" is wrong. Simply consider 20 distinct tests in a special paper without FDR control and reporting each test in separate papers(20 papers here). Note that you may transform data or subsetting data in that special paper. Each paper will have a chance of type I error by $5%$. The chance of committing a type I error is $1-(0.95)^{20}=0.64>>5%$. – user45765 Dec 04 '21 at 21:47
  • The family-wise error rate over the 20 tests will be the same, and the expected number of Type-I errors will also be the same. Whether you publish it in separate papers, or in the same paper. Whether you performed the tests on the same dataset, or on different datasets. – Marjolein Fokkema Dec 04 '21 at 23:41
  • 1
    @MarjoleinFokkema According to the post I found and which I used to vote closing the post, there is no obvious consensus on the matter but it is indeed problematic. – user45765 Dec 05 '21 at 00:27
  • That post seems an excellent discussion. This is an interesting topic. Thanks! – Marjolein Fokkema Dec 05 '21 at 13:06

0 Answers0