16

The Literary Digest poll of 1936 is often mentioned to show the dangers of using convenience sampling.

Are there other famous examples where a wrong sampling method led to erroneous conclusions?

I ask not to question the existence of the problem, but because I'd like to vary the examples I give when explaining the issue to someone else.

In addition, as the example of the Literary Digest poll is now almost 90-year old, I suspect that giving more recent examples could have better educational virtues (feeling "close" to a problem can be motivating to learn more about it).

I'm looking for examples in any domain, it doesn't have to be about political polls, not even about studying humans.

I ask for "famous", because it can give a sense of familiarity with the issue at hand, and it can also make it easier to find detailed information about it for those interested.

Stephan Kolassa
  • 123,354
Daniela
  • 57
  • 4
    If biased conclusions can be expanded to include biased results then the Selective Service's fup wrt the draft lottery of 1970 is a classic example. Gross SS inequities were replaced by a draw of capsules from a drum with 365 birth dates, in theory to ensure an objective, random draw wrt the order of military draft eligibility. Later, it was discovered that there weren't a sufficient number of rotations and birth dates earlier in the year placing sooner in the sequence. https://www.nytimes.com/1970/01/04/archives/statisticians-charge-draft-lottery-was-not-random.html – user78229 Oct 14 '23 at 20:38
  • Mike Hunter: thanks, this is the kind of example I was looking for. – Daniela Oct 14 '23 at 21:18
  • 1
    I'm not sure if it fits but Bill and Melinda Gates put a lot of money into making schools smaller based on the observation that the best student performance (statistically) was seen in smaller schools. It was also true that the worst performance was at smaller schools. I'm not sure exactly how to classify that error. – JimmyJames Oct 16 '23 at 19:41
  • 1
    Related/possible duplicate: https://stats.stackexchange.com/questions/434128 – user103496 Oct 17 '23 at 04:20
  • user103496: thanks, this is interesting. However I'm looking specifically for errors coming from using a wrong sampling method, as the only example I came across again and again (until reading the answers here) for this specific problem was the Literary Digest poll. – Daniela Oct 17 '23 at 05:22

5 Answers5

22

Henrich, Heine & Norenzayan (2010, Behavioral and Brain Sciences) and Henrich, Heine & Norenzayan (2010, Nature), later expanded into a book (Henrich, 2021: The WEIRDest People in the World) argue that most of psychological and sociological research is conducted on participants that are close at hand for the typical academic, and may even be strongly incentivized to partake in such research (e.g., by making college graduation contingent on having spent X hours as participants in studies): college students.

These are not even representative of the society they live in, and far less so of "all humans". Henrich et al. coined the acronym "WEIRD" to describe them: Western, Educated, Industrialized, Rich, and Democratic. Their publications describe how findings from such convenience samples, which are often presented as applying to all humans in general without noting the biasedness of the sample, can lead to conclusions that quite probably do not apply to people from radically different backgrounds. And, to cite the title of the Nature paper: "Most people are not WEIRD."

To take an example from the Nature paper, on social behaviour related to fairness and equality:

Here, researchers often use one-shot economic experiments such as the ultimatum game, in which a player decides how much of a fixed amount to offer a second player, who can then accept or reject this proposal. If the second player rejects it, neither player gets anything. Participants from industrialized societies tend to divide the money equally, and reject low offers. People from non-industrialized societies behave differently, especially in the smallest-scale non-market societies such as foragers in Africa and horticulturalists in South America, where people are neither inclined to make equal offers nor to punish those who make low offers (Henrich et al., 2010).

Thus, extrapolating from a WEIRD sample to "all humans" would (and did) give erroneous conclusions on how "humans" feel about fairness and equality in such situations.

Stephan Kolassa
  • 123,354
  • quite useful example to illustrate the problem, in particular when discussing it with people from academia! – Daniela Oct 14 '23 at 16:25
  • 2
    Heh, gives an ironically-truthful slant to Austin's slogan "Keep Austin Weird". – Dewi Morgan Oct 14 '23 at 23:15
  • But what are some of the "famous examples of biased samples that led to erroneous conclusions"? – user103496 Oct 17 '23 at 04:01
  • @user103496: the biased samples are samples from college students, which are nonrepresentative for "all humans". – Stephan Kolassa Oct 17 '23 at 05:54
  • Yes but what are the erroneous conclusions? Why are these examples famous? – user103496 Oct 17 '23 at 05:58
  • @user103496: strictly speaking, the question only asked for the biased samples, not the erroneous conclusions... but I added an example from the Nature paper. – Stephan Kolassa Oct 17 '23 at 06:29
  • Your link is incorrect. I think you instead wanted this: https://www.nature.com/articles/466029a // I don't think that example you added gives any "erroneous conclusions". Did earlier researchers on the ultimatum game consistently leap to the conclusion that human beings around the world "tend to divide the money equally and reject low offers"? Did Henrich et al. claim that earlier researchers made "erroneous conclusions"? – user103496 Oct 18 '23 at 01:20
  • @user103496: do you mean the very last link, the one in the quote? That link is correct. The quote is from the Nature paper, and it includes a reference to a different paper in Science, also by Henrich, which I linked to. If you do not think the quote supports my claims, then we will likely need to leave it at this. – Stephan Kolassa Oct 18 '23 at 06:15
  • You linked to "https://doi.org/10.1126/science.1182238", which does not contain the quote "Here, researchers often use one-shot economic experiments such as the ultimatum game, in which a player ..." – user103496 Oct 18 '23 at 06:16
  • @user103496: the quote is from the Nature paper, see the sentence immediately preceding the quote: "To take an example from the Nature paper". The link is to a reference cited in the quoted paragraph. – Stephan Kolassa Oct 18 '23 at 06:33
13

One of the most recent important examples where non random sampling affected the results was in the study of the influence of Hormone replacement therapy (for menopausal women) on coronary heart disease.

Up to 2002, HRT was routinely prescribed to women at risk of coronary heart disease. This was based on observational studies that found that women on HRT suffered less heart problems.

"In 2002 and 2004, however, published Randomised Controlled Trials from the Women's Health Initiative claimed that women taking hormone replacement therapy with estrogen plus progestin had a higher rate of myocardial infarctions than women on a placebo, and that estrogen-only hormone replacement therapy caused no reduction in the incidence of coronary heart disease."wikipedia Randomised Controlled Trials - advantage section

One of the principal explanations for this discrepancy is that healthy patients tended to use HRT - in particular women of higher socioeconomic status. Although observational studies did control for a number of factors, socioeconomic status was typically not one of them. socioeconomic factors

paper on expectations influencing observations, summarising other papers

seanv507
  • 6,743
13

Another famous convenience sample:

A diagram of a WWII-era aircraft with red dots showing locations of damage. The engines, cockpit, tail, and mid-win locations have few or no dots.

This one is famous enough that the image alone is often used as a full response to bad reasoning based on survivorship bias. It relates to work done by Abraham Wald in the WWII-era Statistical Research Group, trying to improve the survivability of bombers by identifying the most critical locations.

Quoting from Bill Casselman's article for the American Mathematical Society, which initially sets out to debunk the story but then largely reverses in a postscript after he learns about a source he'd missed:

The military was inclined to provide protection for those parts that on returning planes showed the most hits. Wald assumed, on good evidence, that hits in combat were uniformly distributed over the planes. It follows that hits on the more vulnerable parts were less likely to be found on returning planes than hits on the less vulnerable parts, since planes receiving hits on the more vulnerable parts were less likely to return to provide data.

(NB: the image itself is a much later mock-up to illustrate the issue, rather than an actual diagram produced by Wald/SRG from real data. I'm using it here because it's become the de facto standard for referencing this particular issue.)

  • Your answer does not convey the story very well. It could add that the idea was to increase protection at the areas that were hit most, ie the ones with most holes. – ghellquist Oct 15 '23 at 06:22
  • 3
    @ghellquist This is already stated in the first sentence of the quote from Casselman: "The military was inclined to provide protection for those parts that on returning planes showed the most hits". – GB supports the mod strike Oct 15 '23 at 11:50
  • NB that this image is a prop with manually placed red dots, not any kind of graphic produced at the time or later by analysis. (See this Q on skeptics for a discussion) – 2e0byo Oct 16 '23 at 17:14
  • @2e0byo Yep, hence my "relates to" rather than "comes from", but I'll make that more explicit in the post – GB supports the mod strike Oct 16 '23 at 23:09
4

1948 US Presidential Elections ("Dewey Defeats Truman"):

enter image description here

Friedenson (2009):

Truman's 4.4-percentage point victory contrasted with preelection polls predicting Dewey by 5–15 percentage points. Several prominent pollsters including Gallup used quota sampling (asking until one gets a certain number of respondents from certain groups) instead of probability sampling (asking people who are randomly chosen from some list). The polling was done by telephone. In 1948, Dewey Republicans were more likely to have a telephone than generally less affluent Truman Democrats. Although quota samples favored Dewey, a probability sample predicted Truman's win.

Polling by Gallup (from Ladd, 1992):

enter image description here

(The election was on 1948-11-02, so 2-3 weeks after the last Gallup poll.)

3

The Bradley effect states that polls may be a biased estimate of final votes due to voters not wanting to disclose their actual voting preferences, particularly for African American candidates.

I have previously suggested a similar discrepancy between votes and polls may have been why the 2016 polls predicted a win for Hillary.

Cliff AB
  • 20,980
  • 2
    +1 I often say that the 2016 election should go down as a textbook example of statistical failure on par with the 1936 Literary Digest poll, but whenever I bring it up with statisticians, they tend to shrug their shoulders and say that everything is fine. – Flounderer Oct 16 '23 at 02:38
  • HRC did win the popular vote of about the expected margin, and few states deviated from predictions too. The US electoral law allowing such ridiculously slim winner-takes-all tipping points (and them eventually materializing in 2016) is not a reason of statistical ridicule. And you can't just call dibs on a theory without any real rationale (like, even if we pretend the Bradley effect exists, why is that more of an explanation than just the 1-2% margin of error?) – mirh Oct 16 '23 at 14:52
  • Gelman's explanation for 2016 poll failure was that the prevailing models neglected to include education as a predictive demographic factor. – user78229 Oct 16 '23 at 23:36
  • 1
    @mirh if you combine all those independent polls, you'll have a very very small margin of error that does not explain the difference. As a simple test, if there's no bias in those polls, then each poll should exactly 50% chance of being greater than the observed difference (2.1%), but a simple binomial test will show that this is not the case. – Cliff AB Oct 17 '23 at 05:23
  • Nobody said that there wasn't any bias.. that more or less always happens eventually. 538 even claims specifically the error skwed 3% for democrats (likely due the aforementioned education factor and late deciders), which became 5% at state level (since the ones where instead HRC overpeformed are fewer but with more people). But that polling bias is well in line with the historical average (whose party gets underestimated changes every time then). Among all: there's no evidence for the Bradley effect. – mirh Oct 17 '23 at 23:44
  • @mirh that's an interesting 538 article but I think it shows evidence to my point: Trump told followers not to trust the media, and the stronger the support for Trump, the higher the bias in the polls. If you want to say this isn't the Bradley Effect (technically the Bradley effect is for AA candidates, so it's definitely not exactly the Bradley effect) that's fair, but it does seem like the bias is strongly correlated with loyalty to Trump and I think distrust of mainstream media is a reasonable hypothesis to explain that. – Cliff AB Oct 18 '23 at 03:24
  • 1
    The effect that makes sense to me is Trump voters being more likely to opt-out of MSN/CNN/etc polls, rather than being "shy" about admitting it in public. – Cliff AB Oct 18 '23 at 03:26
  • I'm not sure how you are redefining the bradley effect, and then calling it a day as if any kind of discrepancy would be that (just because any would be voters-driven eventually). The psychological mechanism behind isn't just important, it's the whole crux. Non-response bias was taken into account then (you would also expect a noticeable difference between famous conservative pollsters and the others) but as covered in the follow-up articles it seems like "white voters without a college degree" was the biggest misaccounted quirk. Why this god of the gaps theory rather than looking at the data? – mirh Oct 19 '23 at 14:13
  • @mirh Okay it's not the Bradley effect, it's the voters of one candidate being more likely to opt-out of participation in the polls if that's the bone you'd like to pick (if we're going to be that picky, note that I said 'similar discrepancy', not Bradley Effect in my answer). And it follows perfectly from the data: the higher the support of Trump in the state, the higher the miscalibration of the polls; it's an eyeballed $R^2 = 0.8$ in the plot about Trump out performing polls the most in the Red states from 538 article you cited. – Cliff AB Oct 20 '23 at 02:09