11

In the frequentist worldview, probabilities are long-run relative frequencies. Hence, a fair coin can be defined as a coin, for which the long run relative frequency of each of the sides approaches 0.5, as the number of flips approaches infinity. Fairness is a quality of the coin, for frequentists.

In the Bayesian worldview, probabilities are assessments of event credibilities, made by the subject. One can speak of a specific subject, who believes that heads is as likely as tails for a specific coin.

But does it make sense to speak of fairness as being a property of a specific coin, in the Bayesian worldview?

Sam
  • 537
  • 3
    The outcome of head or tails is not determined by the coin alone, and so ascribing the probability of heads and tails outcomes to the coin is only sensible in theory. The Bayesian might be interested in the person who is flipping the coin as well as the coin. Both their skill and their honesty might be relevant. – Michael Lew Oct 07 '23 at 20:27
  • 2
    A fair coin not only has equal probabilities for the two outcomes on each flip, but also independence between all the flips. Translating that independence into long-run frequencies may be possible, but is not easy. – Henry Oct 08 '23 at 12:32
  • @Henry The 'fairness' of the outcome of a coin toss is not just dependent on the coin. Saying that fairness requires independence is not enough to bring the fairness property back to the coin because the coin has no say in how it is flipped. – Michael Lew Oct 08 '23 at 20:25
  • @MichaelLew You are correct. I think most people usually take "tossing a fair coin" as shorthand to mean a "fair coin tossing process" meaning probability of a head the same as for a tail (0.5), so everything still works. – Graham Bornholt Oct 08 '23 at 21:09
  • @MichaelLew I would guess that the fair/unfair coin argument is particularly popular for its simplicity. external factors are generally not considered. – carlo Oct 09 '23 at 19:45
  • @carlo True. Nonetheless the distinction between the 'frictionless' account of a fair or unfair coin and the real world experience of tossing a coin is interesting and is an excellent case study. Students can play around with coins themselves to see how far conventional statistical models sometimes are from the processes that they attempt to analyse. It is also a situation where an all or none decision is obviously inadequate as a characterisation. – Michael Lew Oct 09 '23 at 20:15

5 Answers5

10

A Bayesian can still use long running frequencies where they are sensibly defined, if the probability of a coin flip returning a head is really $0.5$, a Bayesian will tend to support this hypothesis as the number of coin flips grows. Let the probability of a coin flip returning a head be $p$.

The frequentist will obtain a point estimate $p$, having observed a sequence of coin flips. This might be done by maximum likelihood, using a Binomial model for the result of the coin flip. If their $100(1-\alpha)\%$ confidence interval does not contain $0.5$, the frequentist will reject that the coin is fair.

How the Bayesian decides on whether or not a coin is fair, is different to a frequentist. The Bayesian will start with a prior distribution for the probability of a heads, observe coin flips and then obtain their posterior distribution for the probability of heads (likely a Beta prior and Binomial likelihood). If the posterior has high posterior density at $p = 0.5$, then the Bayesian will believe that that the coin is likely to be fair. If $p=0.5$ has low posterior density, then the Bayesian will believe that the coin is likely to be unfair.

A Mathematical theorem which formalises (and generalises) this result, is the Benstein-von Mises theorem. This loosely says, under a suitable prior and for large samples sizes, Bayesian and frequentist results agree numerically (but not in intepretation!).

jcken
  • 2,907
  • I think that a Bayesian would not focus on the all or none fair/not fair dichotomous definition, and would instead look at a prior or posterior distribution to see which probabilities are well favoured. – Michael Lew Oct 07 '23 at 20:24
  • 1
    I think my answer alludes to what you are saying ("Bayesian will believe that that the coin is likely to be fair" is not dichotomous). However, Bayesian decision analysis would allow us to create a dichotomy in a principled, Bayesian way; a dichotomous decision is allowed within a Bayesian framework – jcken Oct 08 '23 at 06:55
  • 1
    I do not disagree. However, the all or none setting is required for ordinary frequentist accounting, but is not required by a likelihood or Bayesian analysis. 'Fairness' is not (entirely) a property of the coin and it is not an all or none thing. After all, a coin and tossing system might be nearly fair or far from fair. All or none characterisations rob us of interesting details. – Michael Lew Oct 08 '23 at 20:28
8

As I've noted a number of times on this site (see e.g., here, here, here, here, here) the notion that "frequentists" have some special idea of the meaning of probability is illusory. (Indeed, I put this term in quotes because I am not convinced that there is really any frequentist school of thought at all.) All statistical schools that use standard probability measures as the underlying basis for measuring uncertainty accept the validity of the laws of large numbers, and therefore all of them accept that there is a correspondence between marginal probability and asymptotic long-run frequency under appropriate circumstances. If anything, Bayesians have a more detailed view of when and why this correspondence holds, since they do not start with this correspondence as a primary, but derive it from probabilistic models based on exchangeability. This means that all Bayesians are also "frequentists", albeit that they have a more satisfying explanation for why they are "frequentists". The corresponding notion of "fairness" (essentially just probabilistic uniformity) then arises as a simple observation about long-run frequency, just as in the "frequentist" case.

The standard Bayesian account of the correspondence between marginal probability and asymptotic long-run frequency goes like this. Consider a sequence of random variables $X_1,X_2,X_3,...$ (e.g., representing the outcomes of coin tosses) generated by some mechanism that is designed to produce uniform outcomes in the long-run. If we are satisfied that the mechanism is stable and that the conditions for generation of the outcomes does not change then we might reasonably believe that the seequence of variables is exchangeable --- i.e., for any finite subset of outcomes, the probability of that subset of outcomes does not depend on its order. If we are satisfied that this is the case then it follows from the representation theorem (see e.g., O'Neill 2009) that the variables in this sequence are conditionally IID, conditional on the underlying empirical distribution. In the case of a binary sequence (such as outcomes of coin tosses) the empirical distribution is equivalent to the long-run proportions of the two outcomes, so the empirical distribution can be reduced to a single parameter giving the long-run proportion of heads (here we take tails and heads as $0$ and $1$ respectively):

$$\theta \equiv \lim_{n \rightarrow \infty} \frac{1}{n} \sum_{i=1}^n \mathbb{I}(X_i = 1).$$

With this simplification, the representation theorem says that the variables in the sequence are conditionally IID, conditional on $\theta$. Under this account, the law of large numbers ensures that the probability of a head in any given coin toss is equal to $\theta$. Thus, the coin is "fair" if $\theta = 1/2$. Now, Bayesians will hold an a priori view on this parameter and then they would collect data on the outcomes of the coin tosses to update their beliefs. This is a different inference method to the classical "frequentist" methods, but the notion of what a fair coin is is the same as under the "frequentist" approach. In both cases, we recognise the application of the law of large numbers, so we say that the coin is considered "fair" if the long-run proportion of heads is one-half.

Ben
  • 124,856
  • (-1) for misunderstanding (or misinterpreting) the frequentist approach. – Graham Bornholt Oct 08 '23 at 02:39
  • 1
    @GrahamBornholt I see that you've downvoted but haven't responded to the comment thread from here. Did you manage to firm up the idea yet? Just looking to understand the critique and develop my own understanding. – statsplease Oct 13 '23 at 06:51
5

For a Bayesian, the prior describes his knowledge of the "fairness property" of the coin. That knowledge can be vague if the coin is new and unfamiliar, or very precise if he has thrown the specific coin or an identical one in the past. As the coin gets flipped more and more, your knowledge of its property converges to a point.

This can probably best be understood with the Beta-binomial model, where your prior of $p$ is $$\pi(p) = \frac{1}{B(\alpha,\beta)} p^{\alpha-1}(1-p)^{\beta-1}.$$ After tossing the coin $n$ times and seeing $x$ heads, your posterior is $$\pi(p|n,x) = \frac{1}{B(\alpha+x,\beta+n-x)} p^{\alpha+x-1}(1-p)^{\beta+(n-x)-1}$$ The larger $n$ and $x$ get, the tighter $\pi(\cdot)$ will wrap around a specific value of $p$, which is to say your uncertainty regarding the fairness of the coin shrinks.

What this also shows, though, is that $\alpha$ and $\beta$ are essentially a reflection of past tosses. A Bayesian being sure that the coin is fair is equivalent of him being "familiar" with the coin, which means $\alpha$ and $\beta$ are very large. Then, $\pi(p)$ will have converged to a Gaussian and ultimately to a (Dirac) point mass at $p=\frac{1}{2}$.

Durden
  • 1,171
  • 2
    The meaning of $\alpha$ and $\beta$ can be deepened for a better understanding of the prior. When $\alpha =1$ and $\beta=1$, the distribution is flat and thus we have a uniform prior. But suppose one has inspected the coin before tossing and one found nothing peculiar, the conclusion might be that it looks "fair." This conclusion can be parametrized numerically by setting, say, $\alpha =101$ and $\beta =101$. This is equivalent to a "mental prior" of having it tossed 200 times with 100 Heads and 100 Tails, before the actual tossing. This is how one can adopt a peaked prior. – Romke Bontekoe Oct 09 '23 at 05:09
3

Bayesian and frequentist theorist disagree on the definition of probability.

Regardless of that, for everyone a fair coin is a coin that has 50% probability of coming up heads, and fairness is a property of the coin.

N. Virgo
  • 416
carlo
  • 4,545
  • So how would you use the Bayesian definition of probability to define a fair coin? Spell it out please. – Sam Oct 13 '23 at 14:09
  • I'm not the most qualified person to give a bayesian definition of probability, but to bayesian theory, probability is a measure of belief you put into the case of some event happening. to a bayesian thinker, the fairness of a coin is the binary property of the coin to come up heads 50% of the time. This can be a variable itself with its own probability (amount of belief you have int it). you can apply all of this to a bayesian hypothesis testing setting for instance. – carlo Oct 14 '23 at 00:34
  • When you say 'to a bayesian thinker, the fairness of a coin is the binary property of the coin to come up heads 50% of the time', that uses a (loose) frequentist definition of probability. – Sam Oct 15 '23 at 06:37
  • @Sam How so? characters characters... – carlo Oct 15 '23 at 15:39
  • Wikipedia says in the Frequentist Probability entry that "Frequentist probability or frequentism is an interpretation of probability; it defines an event's probability as the limit of its relative frequency in many trials (the long-run probability)". In contrast, the Bayesian Probability is said to be (in the corresponding entry) "an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief." – Sam Oct 24 '23 at 10:50
0

A Bayesian would use all available information. Physics is available information. The Bayesian would check to see if the coin has a reflection symmetry through its cross-section. Fairness is a material symmetry property.