0

I have been doing statistics as part of my job for a few years now. But now I have to try and explain the logic of Null Hypothesis Significance testing to a statistically naive audience.

Say I have the result of an unpaired t-test: t=0.01, p=0.999

I can conclude that there is no difference between groups but why can't I use this as evidence that the groups are the same or nearly the same?

Put another way, why does rejecting the null hypothesis allow us to make a conclusion but failing to reject it does not?

Why are a finding of equivalence and failing to find a difference not the same? (this is more a semantic question)

llewmills
  • 2,151
  • 17
  • 35
  • 1
    What did you estimate the effect to be, zero or something other than zero? – Dave Mar 23 '22 at 02:45
  • Let's say something close to 0 – llewmills Mar 23 '22 at 02:52
  • 1
    In other words, you estimated the effect to be something other than zero, right? – Dave Mar 23 '22 at 02:53
  • Yes I did @Dave – llewmills Mar 23 '22 at 02:53
  • 1
    You estimated the effect to be something other than zero. Why, then, do you want to ignore your estimate and claim the effect to be zero? – Dave Mar 23 '22 at 02:55
  • Ok but can you answer any of the questions in my post? – llewmills Mar 23 '22 at 02:56
  • 1
    "Not significantly different" is not the same as "not different". The former is considering whether a particular likelihood known as a p-value (for this test) is above a confidence threshold, the latter pertains to something being identical. – Galen Mar 23 '22 at 02:59
  • OK leave absolutes aside. So could I conclude with a p=0.999 that the groups are near-identical? – llewmills Mar 23 '22 at 03:03
  • 2
    Because a high p-value could be evidence of a very small sample size. Therefore accepting the null could be mistaken for poor data. If you want to report confidence in an effect close to zero you should focus more on the standard errors than the pvalue. – ecnmetrician Mar 23 '22 at 03:07
  • 3
    No. You might have a gigantic confidence interval, making it plausible that the groups are quite different, even if you had the rotten luck of not catching a big difference. // If you want to show that two groups are super close, look into equivalence testing, but be prepared to define what “near-identical” means in terms of effect size. – Dave Mar 23 '22 at 03:08

1 Answers1

1

I can conclude that there is no difference between groups

Not quite. A more precise conclusion would be that the data are consistent with there being no difference. However, we can't know for sure that there is no difference. The difference could be too small to detect reliably with the given sample size for example.

why does rejecting the null hypothesis allow us to make a conclusion but failing to reject it does not?

You have to understand that the two types of "conclusions" are very different. When we reject the null, we say "It is extremely unlikely that we observe these data if there truly were no difference" and hence conclude/decide to believe that there is a difference. But we don't make a precise statement about the difference; the difference requires estimation, and there is uncertainty in that estimate.

"Accepting the null" or making a precise statement that the difference is exactly 0 is just too precise and could never be verified. To my earlier point as well, the difference could just be too small to detect so "accepting the null" and making a precise statement about the difference being 0 is likely to be false anyway.

It may benefit to think of NHST as putting yourself in a dilemma of sorts, as I explain here.

Galen
  • 8,442
  • For this OP I recommend we emphasize that "consistent with" does not mean the formal logic meaning of "does not induce a contradiction with". Any probability distribution with suitable support could have generated the observed data, even if it is rather unlikely for many choices of model. So there is an unhelpful notion of logical consistency that could be interpreted there. Instead, as you know, the data is more likely than a set threshold (i.e. the confidence level) to have generated the test statistic computed from the data. – Galen Mar 23 '22 at 03:35