Trying to understand math behind "Checking whether a coin is fair"

Question

I would like to understand the standard math done to check whether a coin is fair.

There is a wikipedia article on it, and it outlines two different ways of answering the question using a Baysian and Frequentist approach.

I would like to understand the Frequentist approach, which calculates an "estimator of true probability."

To quote wikipedia:

The best estimator for the actual value r is the estimator $p\,\!={\frac {h}{h+t}}.$ This estimator has a margin of error (E) where $|p-r|<E$ at a particular confidence level.

What does this mean? How do we know that this is the case?

One answerer in this question does the math with the following formula:

The formula to calculate the approximate confidence limits for a binomial test is: $z_{a l p h a > / 2} \sqrt{p q / n}$

How do I find this formula and how is it related to the "best estimator for r"?

In general, I have a rough understanding of central limit theorem, so I get that some kind of normal distribution might pop up if I start looking at a combination of bunch of trials of independent random distributions, but I would like to understand the mathematical derivation for this "estimator" for determining if a coin is fair.

BruceET · Accepted Answer · 2021-06-13T18:34:53.630

Let's start with a simple example. Suppose you want to know whether a coin is fair; that is $p = P(Head) = 1/2.$ You toss the coin $n = 50% times and get $x = 28$ Heads.

Then the point estimate for $p$ is $\hat p = x/n = 28/50 = 0.56.$ Various styles of confidence intervals can be used to give an idea how far from $0.56$ the actual value of $p$ might be.

Confidence intervals. One of these is a 95% confidence interval (CI) of the form $\hat p \pm 1.96\sqrt{\frac {\hat p(1-\hat p)}{n} },$ which computes to $(0.422, 0.698).$ [Computation below, using R as a calculator.] Over many such experiments, this style of CI will include the true Heads probability $p$ of the coin about 95% of the time. The expression qnorm(c(.025,.975)) amounts to $\pm 1.96$ and is based on a normal approximation to the binomial distribution of $X.$ [Wald CI; see Addendum.]

x = 28;  n = 50;  p.hat = x/n
CI = p.hat + qnorm(c(.025,.975))*sqrt(p.hat*(1-p.hat)/n)
CI
[1] 0.4224111 0.6975889

Experience has shown that a slightly different style of CI (due to Agresti and Coull) comes closer to the 95% goal than the one above. It gives the 95% CI $(0.423, 0.688).$ [Sometimes shorter than the Wald CI, sometimes longer, but with greater precision attuned to the circumstances.]

x = 28;  n = 50;  p.est = (x+2)/(n+4)
CI = p.est + qnorm(c(.025,.975))*sqrt(p.est*(1-p.est)/(n+4))
[1] 0.4230227 0.6880885

Notice that both styles give intervals that contain $p = 0.5.$ So data $x = 28, n = 50$ with $\hat p = 0.56$ seems compatible with a fair coin. (However, with $x = 2792$ heads out of $n = 5000$ tosses, the CI's would not contain $p = 0.5.$ With such experiments, it is not just the proportion of heads that matters, but also the total number of tosses.)

Test of hypothesis. Another possibility is to test the null hypothesis $H_0: p=1/2$ against the alternative $H_a: p\ne 1/2,$ at the 5% level of significance. There are several versions of this test, some using exact binomial distributions and some using normal approximations. In R, an exact test is implemented as binom.test, illustrated for data $x = 28, n = 50$ from above. The P-value $0.4799$ tells us that a fair coin would give a result as far $28 - 25 = 3$ from the expected number of heads almost half the time. So 28 heads in 50 tosses would not be an unusual result for a fair coin.

binom.test(x = 28, n = 50, p = 0.5)
    Exact binomial test


data:  28 and 50
number of successes = 28, number of trials = 50, p-value = 0.4799
alternative hypothesis: 
 true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4125441 0.7000928
sample estimates:
probability of success 
                  0.56

Here is how the P-value $0.4799$ is computed: If $H_0$ is true then the observed number of Heads is $X \sim \mathsf{Binom}(n=50,p-0.5).$ The P-value is the probability that $X$ is farther from $E(X) = np = 25$ that what we observed. That is, $P(X \le 22)+P(X\ge 28) =$ $2(0.23994) = 0.4799.$

pbinom(22, 50, .5)
[1] 0.2399438
1 - pbinom(27, 50, .5)
[1] 0.2399438

One more style of CI: Notice that the 95% CI from this procedure is $(0.413, 0.700),$ based on yet another style of confidence interval (Clopper-Pearson CI). [About half a dozen styles are in common use. For more about this see this Wikipedia page.]

Finally, if I had as many as $x=2792$ heads out of $n=5000$ tosses, then the P-value would be much smaller than $0.05 = 5\%$ and we would reject $H_0$ at the 5% level of significance. And the 95% CI would be $(0.545, 0.572),$ which does not contain $p = 0.5.$

Just from your question, I am not sure what aspects of the coin-testing problem interest you most. If there are parts of the above that don't make sense to you, or other topics of interest that I omitted, then you can ask questions in comments or post a more specific Question. Also, the margin of this question has links to 'Related' Q&A's on this site that may be of interest.

Addendum per comment. Argument for the Wald CI (which was originally intended only for very large $n.)$

Begin with $E(\hat p) = p; SE = SD(\hat p) = \sqrt{\frac{p(1-p)}{n}}.$ Assuming $\hat p$ is normal, standardize to get $Z = \frac{\hat p - p}{SE} \stackrel{aprx}{\sim}\mathsf{Norm}(0,1),$ so that $P(-1.96 < Z < 1.96)\approx 0.95.$ Use algebra to transform the event, obtaining $P(\hat p - 1.96SE < p < \hat p + 1.96SE)\approx 0.95.$ Then provided we knew SE, an approximate 95% CI for $p$ would be of the form $(\hat p - 1.96SE,\, \hat p + 1.96SE).$ However $SE = \sqrt{\frac{p(1-p)}{n}}$ contains unknown $p.$ For sufficiently large $n,$ one has $\hat p \approx p,$ so approximate $SE$ as $\widehat{SE} = \sqrt{\frac{\hat p(1-\hat p)}{n}}.$ Then pretend $\left(\hat p - 1.96\widehat{SE},\, \hat p + 1.96\widehat{SE}\right)$ is 95% CI for $p.$

thanks for the write-up. I am still struggling to understand the derivation of the Binomial proportion confidence interval. Is there a straightforward way to arrive at $\hat{p} \pm z \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$? — Steven Sagona, Jun 13 '21 at 11:48
Tried to show that in Comment beneath your Q, Will try again in Addendum. A difficulty is that there are many approximations. — BruceET, Jun 13 '21 at 15:08
even a reference resource is good, ideally one with the derivation and doesnt require reading an entire textbook. — Steven Sagona, Jun 13 '21 at 15:25
I learnt from this answer. The Wikipedia link says “[t]he Normal approximation interval and its presentation in textbooks has been heavily criticised, with many statisticians advocating that it be not used. The principal problems are overshoot (bounds exceed [0, 1]), zero-width intervals at ${\hat {p}} = 0$ and $1$ (falsely implying certainty), and overall inconsistency with significance testing.” So the other methods you mention may be better for actual data anaysis. — Single Malt, Jun 13 '21 at 16:46
Yes, Others are better. Agresti-Coull, using $\hat p = (x+2)/(n+4)$ is noticeably better. Exact method implemented in R's binom.test gives wider intervals (sometimes unnecessarily wide), but true 95% coverage. See my Wikipedia link for discussion/formulas of several types. // Simple to use in R and surprisingly good is to take quantiles $.025$ and $.975$ of $\mathsf{Beta}(x+.5, n-x+.5),$ which gives $(0.422, 0.690)$ for my example with $x=28, n=50.$ [Jeffries CI.] — BruceET, Jun 13 '21 at 18:12
@BruceET, Thanks for the explanation. I'm still conceptually confused - as I get lost conceptually in your derivation. I wrote a new question about it to hopefully make that more clear. which is here — Steven Sagona, Jun 15 '21 at 13:58

Trying to understand math behind "Checking whether a coin is fair"

1 Answers1

Linked