4

In the last 4 hours I have been trying to wrap my head around the Error 1 Inflation that, citing wikipedia, happens, when we perfom multiple independent tests on the same dataset. The probability that one of these results ends up with a Type 1 error when not caring about adjusting the significance level is: $1−(0.95)^{t}$. What I do not understand is, why doesnt this apply to all tests that are done and why the "same dataset", when I do a test today, and another one on a completely different question tomorow, the probability that one of these is higher also rises. I know this has been asked before, but none of the answers really helped me. Thank you

Is there a rule of thumb to decide which tests fall into the "same bin"? Is the same dataset really important?

I have just read this article: http://daniellakens.blogspot.com/2020/03/whats-family-in-family-wise-error.html and I think I wasnt aware of a huge part of this whole debate. When considering a situation, where the result is true if only of the hypothesis is true, It is absolutely clear to me that we need to consider FWE being an issue. But if we that 3 individual claims on their own, is it okay not to adjust?

QED
  • 163
  • 1
    The family is whatever collection of tests you decide to control the overall rate of at least one type I error across. – Glen_b Jan 19 '23 at 23:41
  • 1
    So you are saying that when talking about FWE, it is severly important to keep in mind what actually is important. When I plan a study, and I am gonna use 5 tests, I do not care about the 1 test I may be doing in 5 Years, I care about how likely is it that one of my 5 tests produces a wrong result. What is important are the results from the upcomming study, so for these the alphas need to be adjusted? – QED Jan 19 '23 at 23:45
  • 1
    That's an example of the sort of reasoning a person might engage in to decide what they want to limit type I error over, sure. Consider another researcher who reasons as follows: "I chose my per-test type I error rate, alpha, after careful consideration of the tradeoff between the importance of the two error types and believe that at the sample size I have planned that this should give a good (not too far from some sort of optimum) compromise between the two rates of errors; so any attempt to control the overall type I error rate would be moving away from that rough optimality I already have." – Glen_b Jan 20 '23 at 00:00
  • But why should I not take the test of tomorow into consideration, even though it has a completely different meaning? This seems "plausible", but I cant figure out why it is reasonable – QED Jan 20 '23 at 00:19
  • 2
    Who said you should not? That's up to you. – Glen_b Jan 20 '23 at 06:12

1 Answers1

2

What I do not understand is, why doesnt this apply to all tests that are done

You understand it correctly. It also applies when we do multiple tests with different datasets.

This is why significance levels are sometimes very low. See for instance the history behind 5-sigma (Origin of "5$\sigma$" threshold for accepting evidence in particle physics?).