5

The Wikipedia article on Barnard's test says:

While first published in 1945 by G.A. Barnard, the test did not gain popularity due to the computational difficulty of calculating the p value and Fisher’s specious disapproval.

Are there other examples like this one, where non-technical reasons led (at least in part) to a statistical method being overlooked?

Daniela
  • 57
  • 3
    A number of Wikipedia statistics articles remain substandard and a quick read of the Barnard's test entry is not reassuring. For example, the 'specious' comment. Where is Barnard's explanation for why he thought Fisher's test was better? – Graham Bornholt Nov 18 '23 at 11:24
  • @SextusEmpiricus Certainly, but this is not the only reason, assuming the wikipedia article is correct about Fisher's "specious disapproval" (which I now doubt with your and Graham Bornholt's comments). – Daniela Nov 28 '23 at 11:49
  • 1
    @Daniela I have deleted my comment. I misrepresented Barnard's test in a simplied way as selecting a nuisance parameter based on a worst case scenario, but it is more complex than that. – Sextus Empiricus Nov 28 '23 at 12:29
  • 2
    Anything involving cycling through many subsamples or permutations was hard work for a long time, including FIsher's own exact test. Bootstrapping may seem an exception to the rule -- an idea which conveniently was released just when good computing support was becoming routine -- but the recommended small numbers of bootstrap samples in early publications are an indicator that technology was still a constraint. Bayesian approaches are perhaps an even more important example. – Nick Cox Nov 28 '23 at 12:29
  • thanks @SextusEmpiricus and Nick Cox for the explanations. In case you're interested, I asked a follow-up question about the statement made in the wikipedia article: https://stats.stackexchange.com/questions/632516/was-fishers-disapproval-of-barnards-test-really-specious-as-mentioned-on-wi – Daniela Nov 28 '23 at 12:43
  • Are "non-technical reasons" reasons not having to do with limitations on computing power, or reasons not having to do with Statistics? – Scortchi - Reinstate Monica Nov 28 '23 at 15:18
  • @Scortchi-ReinstateMonica I had the latter in mind, but I'm not married to this definition of "non-technical", as my question is just out of curiosity. If you ask because you think it may be off topic, my line of reasoning for posting this question here was that even if reasons are initially unrelated, eventually it's related as it can have a negative impact on the field (that's the reason why I used the tag "history"). However, I have no issue with the question being removed if moderators find it too problematic. – Daniela Nov 28 '23 at 15:46
  • 1
    I just thought a clarification might be useful - I can see why it might be thought somewhere near the boundary of our site's scope, but don't find it problematic myself. I have made it 'Community Wiki' however, as there are clearly an unlimited number of possible answers & no criteria with which to select the best one. – Scortchi - Reinstate Monica Nov 28 '23 at 16:00

2 Answers2

9

Methods get overlooked for lots of reasons, but technical difficulties is actually very often a big part of it. E.g. if you look at these examples:

  • Bayesian methods: on the one hand people only realized MCMC methods let you fit really flexible models in the 1990s (and computing power started to get to the point where MCMC sampling became possible for more complex models), on the other hand you have a culture that was quite negative towards Bayesian approaches (typically attributed to Fisher and others being quite negative about them).
  • Neural networks: on the one hand using GPUs made it so much easier to train deeper models in a reasonable time in the 2010s, on the other hand there were all sorts of perceptions about neural networks not being so promising (e.g. that a single-layer perceptron cannot implement every function and multiple layers with linear activations are actually mathematically equivalent to a single layer etc.)

In both cases there's both a perception issue and a technical issue (with some people keeping on pursuing these approaches despite not being mainstream). Interestingly, after the technical issue got largely solved, the second case took off massively relatively quickly, while the uptake of Bayesian methods is slower. That may of course simply reflect massive performance improvements over previous approaches on tangible use cases (e.g. image classification, translation...).

In other cases, it may really just be a case of university teaching/text books being out-of-date, peer-reviewers relying on outdated knowledge or people keeping on doing what others have done: e.g. why people use stepwise regression instead of the many more sensible methods, pre-tests for normality, why people do pairwise comparisons of doses to control instead of dose-response modeling (e.g. MCP-Mod) etc.

I also know one example where the problem was outdated regulatory guidance: for drug approvals for diabetes treatments, the US Food and Drug Administration (FDA) required last-observation-carried-forward (LOCF) for imputing data for missing data/patients that stopped treatment. This guidance did not get changed for several years, even after it was basically clear to everyone that this has all sorts of problems that could be avoided by e.g. appropriate multiple imputation for most estimands (or implicit imputation through likelihood methods for some estimands).

Björn
  • 32,022
5

The rank difference test of Kornbrot. This test is a replacement for the Wilcoxon signed-rank test for paired data, which has always had the disadvantage of being dependent on how you transform the response variable. The rank difference test is transformation-invariant.

Frank Harrell
  • 91,879
  • 6
  • 178
  • 397