5

I've read a couple of theoretical explanations of what it is (in particular this thread: When to use Fisher versus Neyman-Pearson framework?), but so far I didn't find real-life examples of people actually using it.

My guess is that some people might be using the N-P framework, but simply don't say it clearly, hence my difficulty finding examples. Or maybe they're using it in projects that don't result in papers available publicly.

My understanding is that this framework seems more useful in industrial applications than in research, and I'd like to find examples of it, to get a better grasp of it. The theoretical explanations I've found are good, however I'd like to test my understanding against reality.

Could someone provide a real-life example of it? Or an example inspired by a real-life situation? If possible, I'd prefer examples where this framework is correctly used :)

Thanks!

Coris
  • 347
  • 6
  • 2
    I don't have any examples of where the Nyeman–Pearsonian hypothesis test framework has been appropriately applied, and I suspect that there are very few in scientific research. The reason might be related to the fact that the N–P framework sets out to have characterised global error rates that attach to the statistical method whereas a scientific approach should usually be based on the evidence regarding thing in question. I have fleshed out that distinction here: A Reckless Guide to P-values, Local Evidence, Global Errors https://link.springer.com/chapter/10.1007/164_2019_286 – Michael Lew Feb 22 '24 at 21:30
  • 1
    The modern approach to hypothesis testing is often described as a hybrid between the Neyman-Pearson and the Fisherian approaches, in which p-values have a central role. – Graham Bornholt Feb 23 '24 at 00:49
  • @GrahamBornholt That hybrid approach has been discussed on this site several times in the past. In my opinion it is less applicable as a support to scientific inference than the original Fisherian approach. See here: https://stats.stackexchange.com/questions/112769/is-the-hybrid-between-fisher-and-neyman-pearson-approaches-to-statistical-test – Michael Lew Feb 23 '24 at 01:53
  • @MichaelLew It seems that there are many different hybrids, I was not meaning what you call the hybrid NHST approach. (The latter seems to have 'straw man' elements: who after all treats 'decisions' as final in science?) – Graham Bornholt Feb 23 '24 at 06:19
  • @Coris Sorry for making the hybrid comment, I was merely trying to explain why pure Neyman-Pearson applications might be hard to find. Let's hope others can provide examples for you. – Graham Bornholt Feb 23 '24 at 06:23
  • @GrahamBornholt No problem, I find the discussion interesting. – Coris Feb 23 '24 at 18:16

1 Answers1

2

Pivotal clinical trials are perhaps the example of Neyman-Pearson hypothesis testing, although even there P-values are usually also reported rather than just 'critical value exceeded'. This is also a result of the convenient connection between the P-value and $\alpha$ though there's definitely cases where the magnitude of the P-value was considered in weighing evidence, so even here the hybrid between both approaches prevails.

Still, such studies actually prespecify an alternative hypothesis -- the intervention will produce an effect $\Delta$ which is at least some threshold of clinical relevance, or no worse than the current standard, or...

They include sample size justification in the form of power/type II error rate estimation and impletent strong family-wise type I control over all formal statistical claims that will be tested. Both of these are mandated by several regulatory agencies that will authorize drugs for their market based on these results.

The fact that FWER is maintained over several tests including interim looks results in analyses often working primarily in the critical value / $\alpha$ scale, because you no longer have the common 'nice' thresholds of e.g. $\alpha=0.05$. Instead you might use for example $z=2.478$ / $\alpha=0.003$ as a stopping boundary in favour of $H_A$ at the first look, followed by $z=2.257$ / $\alpha=0.012$, and so on.

Taken together a pivotal clinical study is about the purest example of 'hypothesis testing done right' that I know of. This isn't advocating for hypothesis testing, there's examples of $H_0: \Delta=0$ being rejected for drugs that ended up having no meaningful (but still possibly non-zero) clinical effect, and likewise the focus on FWER control has killed studies that prespecified $\alpha$ spending but didn't actually ended up testing (you said you would so the $\alpha$ is 'used up' anyway). The regulations surrounding these studies are driven in no small part by the desire to maintain FWER above all else.

PBulls
  • 4,378