17

There has been a growing chorus against the conventional NHST (Null Hypothesis Significance Testing). One thing is the blind usage of a monolithic significance level $5\%.$

In a recent thread at CV, user Peter Flom claimed that Fisher had asserted somewhere that he advocated changing the threshold.

While Peter couldn't remember the exact quote and the source, it was supposed to be like:

... no sane researcher uses the same significance level in all cases.

My question is, did Ronald Fisher ever express anything on this issue? Would appreciate if someone is able to pin down the source where he made such statement(s).

User1865345
  • 585
  • 1
  • 15

1 Answers1

20

Jerry Dallal collected some of Fisher's quotes from various works. And Fisher said too much, first setting the P=0.05 cutoff in 1925, then not following it himself, and, finally, openly advocating flexibility in 1956 in response to Neyman and Pearson taking it too far. However, even then he advocated not setting any fixed cutoff at all, not changing it, this is what the passage alluded to in the OP is about. On more charitable reading, Fisher always meant 0.05 only as a vague threshold, not a cutoff. For more on Fisher's vs Neyman-Pearson's approaches see Hypothesis testing: Fisher vs. Popper vs. Bayes.

The P=0.05 level comes from Fisher's Statistical Methods for Research Workers (1925), albeit with some wiggle room:

"The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not. [...] If P is between .1 and .9 there is certainly no reason to suspect the hypothesis tested. If it is below .02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. Belief in the hypothesis as an accurate representation of the population sampled is confronted by the logical disjunction: Either the hypothesis is untrue, or the value of $\chi^2$ has attained by chance an exceptionally high value. The actual value of P obtainable from the table by interpolation indicates the strength of the evidence against the hypothesis. A value of $\chi^2$ exceeding the 5 per cent. point is seldom to be disregarded."

Despite these declarations, he was more flexible in practice in the same book, or, less charitably, inconsistent:

"The results of t shows that P is between .02 and .05. The result must be judged significant, though barely so [...] we find... t=1.844 [with 13 df, P = 0.088]. The difference between the regression coefficients, though relatively large, cannot be regarded as significant."

Says Dallal:"Part of the reason for the apparent inconsistency is the way Fisher viewed P values. When Neyman and Pearson proposed using P values as absolute cutoffs in their style of fixed-level testing, Fisher disagreed strenuously. Fisher viewed P values more as measures of the evidence against a hypotheses..." It is only then, in Statistical Methods and Scientific Inference (1956), that Fisher elevated his practiced flexibility and attention to context to a principle:

"The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance."

Conifold
  • 75,870
  • 5
  • 181
  • 284
  • 2
    As usual, terrific comprehensive post by you. I have accepted it. – User1865345 Feb 21 '24 at 09:45
  • 3
    What a great answer! And thanks to @User1865345 for putting my question here. I will have to browse this site as HSM is interesting to me. – Peter Flom Feb 21 '24 at 11:51
  • 1
    Cowles, M., & Davis, C. (1982). On the origins of the. 05 level of statistical significance. American Psychologist, 37(5), 553. contains the pre-history of the 0.05 significance level. – Alecos Papadopoulos Feb 21 '24 at 17:27