20

Background

Edit: I realize my use of the word "hypothesis" is confusing, I do not mean specifically a null hypothesis. I mean a proposition that something is true.

From my limited understanding, Bayesian probabilities represent beliefs. A scientist may therefore assign a belief/probability to the statement that a hypothesis is true before conducting an experiment or study, and then through formal mathematical reasoning calculate an updated belief as a numerical value (probability) when the results of the study are available.

From a frequentist point of view, a probability is not a belief. Nevertheless, it is common to find phrases along the lines of "Our study strengthens the evidence that H is true". Given that a study has produced results that gives "supporting evidence" to a hypothesis, it seems reasonable that a frequentist would have a "stronger belief" in this hypothesis. Regardless of whether the prior and posterior beliefs are represented by numbers or not (but not probabilities), it undoubtedly seems that there ought to be an order between them, such as "belief after study" > "belief before study". But exactly how this updating of beliefs would happen or how to convey how much more one believe a hypothesis is true after a study compared to before the study, is unclear to me. Granted, I am quite ignorant of statistics.


Question: Within the frequentist school of thought, is there a formal / mathematical procedure for updating beliefs?


If there is no such procedure, it seems difficult to make sense of a scientist saying that a study strengthens the evidence that something is true, beyond a "more than" and "less than" perspective. The mapping from prior and new data to beliefs seems a lot more opaque to me from the frequentist perspective compared to the Bayesian one. Sure, the Bayesian have subjective priors, but given those priors, the data and chosen analysis it seems very clear exactly how the beliefs are updated through Bayes' rule (although I know frequentists can use Bayes' rule too, just not for beliefs). On the other hand I hardly think someone employing a Bayesian methodology necessarily would actually let an obtained posterior probability represent their exact belief about something, since there can be a lot to doubt, disagree with or improve in a given analysis. I'm not trying to instill any debate between "Bayesian vs. Frequentist", I'm far too ignorant to have an opinion. Hopefully this question is not nonsensical, in that case I apologize.

Richard Hardy
  • 67,272

3 Answers3

18

If you're representing beliefs coherently with numbers you're Bayesian by definition. There are at least 46656 different kinds of Bayesian (counted here: http://fitelson.org/probability/good_bayes.pdf) but "quantitatively updating beliefs" is the one thing that unites them; if you do that, you're in the Bayesian club. Also, if you want to update beliefs, you have to update using Bayes rule; otherwise you'll be incoherent and get dutch-booked. Kinda funny how the one true path to normative rationality still admits so many varieties though.

Even though Bayesians have a monopoly on 'belief' (by definition) they don't have a monopoly on "strength of evidence". There's other ways you can quantify that, motivating the kind of language given in your example. Deborah Mayo goes into this in detail in "Statistical Inference as Severe Testing". Her preferred option is "severity". In the severity framework you don't ever quantify your beliefs, but you do get to say "this claim has been severely tested" or "this claim has not been severely tested" and you can add to severity incrementally by applying multiple tests over time. That sure feels a lot like strengthening belief; you just don't get to use that exact word to describe it (because the Bayesians own the word 'belief' now). And it really is a different thing, so it's good to avoid the possible terminology collision: what you get from high severity is good error control rates, not 'true(er) beliefs'. It behaves a lot like belief in the way it is open to continual updating though! Being picky about not calling it 'belief' is purely on the (important) technicality of not dealing in states-of-knowledge, distinguishing it from the thing Bayesians do.

Mayo writes and links to plenty more on this at https://errorstatistics.com/ Sounds like you might enjoy "Bernoulli's Fallacy" by Aubrey Clayton: it's pretty accessible popsci but really cuts to the roots of this question. Discussed in podcast form here https://www.learnbayesstats.com/episode/51-bernoullis-fallacy-crisis-modern-science-aubrey-clayton

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
12

Disclaimer: I have a Bayesian bias.

The purpose of frequentist hypothesis testing is to reject the null hypothesis: that is not the same as proving the alternative hypothesis. Such experiments don't give you the "evidence that $H$ is true", even if you can hear people making claims like this. The $p$-value is the probability of observing the data $D$ more extreme than observed given that the hypothesis $H$ is true $P(D>d|H)$.

The Bayesian posterior probability is the other way around, the probability that $H$ is true given the data $P(H|D) = P(D|H) P(H) / P(D)$. So in fact in the Bayesian setting, you do update your prior belief $P(H)$ about $H$ given the observed data, while in the frequentist setting you don't. Frequentist experiments don't tell you how likely $H$ is and because of that, it doesn't give you a direct framework for updating your beliefs about it. If you rejected the hypothesis that $H=5$, it still can be anything (just not $5$), and you still don't know what it is.

There is also maximum likelihood, "find $H$ such that the likelihood of observing $D$ is highest under it", but again, it doesn't tell you what $P(H|D)$ is.

Finally, if you consider probabilities to be a measure of beliefs, you are already taking a Bayesian side. Thanks to adopting such a viewpoint, you can measure how true it is given the data. In a frequentist setting, you can only make the "assuming that it is true" claims.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Tim
  • 138,066
  • Tim, sorry. I realize my use of the word "hypothesis" was misleading. I mean it in a more general sense, like a proposition that something is true. Even if frequentists experiment don't tell you how probable $H$ is when $H$ is a null hypothesis, scientific articles do tend to contain wordings such as "the evidence suggests that is true". For example, "The evidence suggests that regular consumption of fruits and vegetables decreases the risk of heart disease". There is a claim that there is stronger evidence compared to before, and thus beliefs should be updated, but how? – DancingIceCream Apr 02 '22 at 21:39
  • But the other part of the answer, where you say that "it doesn't give you a direct framework for updating your beliefs about it", does answer my question. I'll wait a bit for more opinions, else accept it later. – DancingIceCream Apr 02 '22 at 21:41
  • @DancingIceCream but what you are describing is not a formal hypothesis in the statistical sense, both in Bayesian and frequentist hypotheses you would need to transform your scientific hypothesis, to a statistical one. – Tim Apr 02 '22 at 21:41
  • Right, that's a good point. Say the null hypothesis is $\mu = 0$, where $\mu >= 0$. Say that after the study the null hypothesis $\mu=0$ was rejected. Then there should be updated beliefs about whether or not $\mu > 0$ is true. If I understand you correctly, there is not framework for frequentists to make this belief update about $\mu > 0$. There's no formal framework for numerically comparing the belief that $\mu > 0$ before the study vs. after the study. But Bayesians can do that. Have I understood you correctly? – DancingIceCream Apr 02 '22 at 22:03
  • Even though I realize now that Bayesians may not use the lingo of "rejecting". – DancingIceCream Apr 02 '22 at 22:08
  • @DancingIceCream in a frequentist setting after the experiment you know that the data is unlikely if $\mu=0$, but you still don't know what $\mu$ is. So the method didn't directly give you information about $\mu$. – Tim Apr 02 '22 at 22:10
  • 1
    There are Bayesian hypothesis tests, you can reject $H$ if there is low probability of it being true $P(H|D)$, one could argue that it even makes more sense to do it like this. – Tim Apr 02 '22 at 22:11
  • "I mean it in a more general sense, like a proposition that something is true. " a frequentist fundamentally cannot make a probabilistic argument about such things because they don't have a long run frequency so they don't have a non-trivial frequentist probability. Such questions are simply unanswerable within a frequentist framework. A proposition is either true (p = 1) or it is false (p = 0) it is just we don't know which. – Dikran Marsupial Apr 02 '22 at 22:14
  • @Tim Even if the study results give indirect information about $\mu > 0$, as far as I can see there still must be an updating of beliefs about $\mu > 0$ since $\mu = 0$ and $\mu > 0$ covers all possibilities. One cannot reduce ones belief in $\mu = 0$ without also increasing ones belief in $\mu > 0$. But you seem to say that there is no mathematical formalism that describes how "much" the belief in $\mu > 0$ "increases" in the frequentist setting. – DancingIceCream Apr 02 '22 at 22:18
  • 3
    Tongue somewhat in cheek, I'd say that frequentists become subjectivist Bayesians as soon as they compute the p-value. They interpret the outcome of the test as evidence that one of the hypotheses is probably true, but to do so they have silently switched frameworks. A long run frequency is a perfectly reasonable basis for a subjectivist Bayesian belief ;o) – Dikran Marsupial Apr 02 '22 at 22:18
  • @DikranMarsupial Sure, I understand that. But that does not mean that frequentists could not represent beliefs as numbers. – DancingIceCream Apr 02 '22 at 22:19
  • @DancingIceCream when they do so, they are no longer within the frequentist framework, but a Bayesian one. – Dikran Marsupial Apr 02 '22 at 22:19
  • @DikranMarsupial Hmm, that was new information for me. I thought what signifies the Bayesian framework is representing beliefs as probabilities not just numbers. – DancingIceCream Apr 02 '22 at 22:21
  • @DikranMarsupial "They interpret the outcome of the test as evidence that one of the hypotheses is probably true, but to do so they have silently switched frameworks." Interesting take, I have thought this seems to be the case! – DancingIceCream Apr 02 '22 at 22:23
  • 3
    IMHO it is what causes so many misinterpretations of frequentists statsitics - they are often not giving direct answers to the question we really want to ask, e.g. is this hypothesis probably true? Because we get an indirect answer, we naturally want to try and shoehorn it into being the answer to the question we wanted to ask. Frequentist procedures are fine, it is just mixing frameworks that is problematic. – Dikran Marsupial Apr 02 '22 at 22:28
  • 1
    @DancingIceCream I am glad you have received an answer you are happy with. For the record, the question deserves a fuller answer but perhaps some frequentists think providing such an answer would not be well-received in this milieu. P.S. there is no "silent switching of frameworks" – Graham Bornholt Apr 03 '22 at 05:15
  • 6
    The p-value is the probability of observing the data D given that the hypothesis H is true P(D|H). I do not believe this is accurate; consider updating. Also, in a frequentist setting after the experiment you know that the data is unlikely if μ=0, but you still don't know what μ is: well, you usually have a point estimate of $\mu$, so this is what it is. Frequentists just do not have a measure of strength of belief in that $\mu$. – Richard Hardy Apr 03 '22 at 06:18
  • 1
    @GrahamBornholt It was tongue-in-cheek, there is only an silent switching of frameworks if you assign a number to the degree of belief in a particular proposition, which frequentists cannot do but Bayesians can. The point I was making is that frequentists do have a concept of the plausibility of a proposition, as it is a natural thing to do, but have no way of expressing that within their framework. This isn't a problem with frequentism, it is just a question that falls outside its scope. – Dikran Marsupial Apr 03 '22 at 07:14
  • @RichardHardy I added a link for the definition of p-value. Sure, it’s not the most rigorous one, but it is not wrong. – Tim Apr 03 '22 at 12:07
  • 1
    @Tim, I think it is fairly wrong, to the extent that the statement should be corrected. Wrong definitions and interpretations of p-values abound; we really do not need another one. – Richard Hardy Apr 03 '22 at 12:48
  • 6
    @Tim your definition of p-value is subtly incorrect. The p-value is the probability of observing data more extreme than what was observed, not the probability of observing the data that were observed. One more reason to dislike p-values. Note that the probability of observing the observed data is zero in the continuous data case. – Frank Harrell Apr 03 '22 at 14:22
  • 1
    @FrankHarrell ok, edited the wording. It was oversimplification on my side to skip unnecessary details, but your edit suggestion makes much sense. – Tim Apr 03 '22 at 14:48
  • @DikranMarsupial I understand what you are saying. Of course, even if frequentists have views about the plausibility of a hypothesis, they may not find it important enough or interesting enough to want to express it. – Graham Bornholt Apr 05 '22 at 01:23
  • @GrahamBornholt why would a frequentist perform an NHST if they weren't interested in the plausibility of the research hypothesis? – Dikran Marsupial Apr 05 '22 at 05:23
  • @DikranMarsupial Of course, a frequentist performing a test may be highly interested in the validity or otherwise of a research hypothesis, and so hopes the test outcome will provide useful information in that regard. The test outcomes are what is important. Not sure the personal view of the experimenter is that interesting or important. – Graham Bornholt Apr 05 '22 at 06:08
  • @GrahamBornholt Why is the test outcome important if not to shed light on the validity/plausibility of the research hypothesis? Note that Bayesian probabilities are not necessarily subjective (c.f. Jaynes) but also that the experimenter's personal views do enter into frequentist NHSTs, for example setting the significance level, which is effectively a representation of their prior beliefs of the hypothesis (if we think the research hypothesis is unlikely a-priori, we set a more stringent significance level, c.f. https://xkcd.com/1132/) – Dikran Marsupial Apr 05 '22 at 07:30
2

No there is not a formal method that frequentists follow to update their beliefs. My very-abridged explanation for why this is the case is as follows, and focuses just on frequentist testing methods. We can regard hypothesis testing as addressing the question: Given the assumed probability model, is the data (statistically) consistent with the null hypothesis? To take a concrete example, suppose a claim has been made that aspirin reduces acne. A group of scientists decide to test that claim and a large well-designed experiment is undertaken. The null hypothesis is that aspirin does not reduce acne. A p-value of 0.3 is observed, and the null hypothesis is not rejected. The scientists may or may not have had views (let alone ‘beliefs’) about the null hypothesis before or after the experiment, but who cares if they did? What matters is the evidence produced by the experiment. Science progresses by testing the consensus until sufficient evidence arises to change that consensus.

  • 1
    "The scientists may or may not have had views (let alone ‘beliefs’) about the null hypothesis before or after the experiment, but who cares if they did? What matters is the evidence produced by the experiment." - But evidence does not speak for itself. What ultimately will decide whether aspirin is approved as a medication for treating acne are people's beliefs about its effect, not a p-value or any other statistical evidence. In my view, there needs to be a mapping from evidence to beliefs, otherwise statistical experiments are pointless. – DancingIceCream Apr 05 '22 at 06:04
  • Maybe that falls more into the realm of the philosophy of science rather than statistics. But to me it seemed, at least superficially, that this mapping from evidence to beliefs is clearer with a Bayesian methodology. Besides, I certainly care about my own beliefs after conducting a statistical test. The fact that I personally struggle more with interpreting frequentist statistical evidence, how they should affect my beliefs, compared to Bayesian statistical evidence, was part of what prompted my question. – DancingIceCream Apr 05 '22 at 06:08
  • Let's continue the example then. The drug regulator would not approve the use of aspirin for the treatment of acne based on that experiment alone since there was not sufficient evidence of its efficacy. (Further trials can change that decision) If a patient asks a doctor about taking aspirin for acne, the doctor can indicate that the initial claim has not been supported by research. So the test is not pointless. – Graham Bornholt Apr 05 '22 at 06:28
  • "The drug regulator would not approve the use of aspirin for the treatment of acne based on that experiment alone since there was not sufficient evidence of its efficacy" - That's a belief, as is "has not been supported by research". I do aggree with you that if I'm a scientist evaluating a study, the beliefs of the scientists conducting the study is not of much relevance. But I as a scientist will still need to form a belief based on the evidence. This formation of belief seems to be completely outside the realm of frequentist statistics, and at least partly within Bayesian statistics. – DancingIceCream Apr 05 '22 at 07:52
  • 1
    "So the test is not pointless" - no, since people actually formed beliefs based on the test, there was a mapping from test to beliefs, but this mapping was maybe done outside the area of statistics. It still has to be done if there's going to be any point doing the test. – DancingIceCream Apr 05 '22 at 07:55
  • "The scientists may or may not have had views (let alone ‘beliefs’) about the null hypothesis before or after the experiment, but who cares if they did? " how did they choose an appropriate significance level? Note also that we often use a null hypothesis that we know a-priori to be false, e.g. is this coin unbiased, i.e. p(head) = 0.5. – Dikran Marsupial Apr 05 '22 at 07:55
  • ""The drug regulator would not approve the use of aspirin for the treatment of acne based on that experiment alone since there was not sufficient evidence of its efficacy" the drug regulator here seems to be trying to use the outcome of the test to update their beliefs regarding the plausibility of the drug's efficacy (which was the research hypothesis). – Dikran Marsupial Apr 05 '22 at 07:57
  • @DancingIceCream It has been long understood that the statistical result is not the whole story. For example, Cox (1958) draws the distinction between statistical inference and scientific inference, with the statistical outcome being step one in the process. Although my answer may not be one you would follow, it does succinctly answer your question. However, I understand why it is not an answer that is likely to get any votes here. – Graham Bornholt Apr 05 '22 at 12:16
  • @GrahamBornholt Yes it does answer the question by stating that there is no formal belief updating method, although it had already been stated in previous answers. I just didn't find the explanation why very convincing, but that might be due to my own lack of understanding of frequentism and the scientific method in general. So have an upvote, in particular for that reference to Cox. I hope to read it. – DancingIceCream Apr 06 '22 at 08:24