10

Given a coin with unknown bias $p$, how can I generate variates — as efficiently as possible — that are Bernoulli-distributed with probability 0.5? That is, using the minimum number of flips per generated variate.

Neil G
  • 15,219
  • 3
    A simple solution is to flip the coin two times: If it's $HT$ map it to heads, if it's $TH$ map it to tails. Otherwise, repeat the experiment until one of these two is achieved. – cardinal Feb 19 '13 at 04:49
  • 1
    @cardinal: Nice! Why not add an answer? – Neil G Feb 19 '13 at 05:14
  • So common you can just google for it – Glen_b Feb 19 '13 at 05:54
  • 2
    @Glen_b: Okay, but can you do it with the minimum number of flips per generated variate? – Neil G Feb 19 '13 at 06:55
  • 1
    Hmmm. That's a significantly harder question; you may want to modify your question or post a new one. – Glen_b Feb 19 '13 at 09:50
  • How to do this in the case where $p$ is known probably depends on $p$, so how to do it in the case where $p$ is unknown probably depends on what you know about $p$ before starting. – Michael Lugo Feb 19 '13 at 20:00
  • 3
    @MichaelLugo: I'd say it definitely depends on $p$. :-) If $p = 1/2$ we know we can do it in one flip. If $p = 1/4$ we know we can do it in two and, we know that this is optimal in both cases. The answer should be related to the entropy $H(p)$. If we know nothing about $p$ other than that $p \in (0,1)$, then I suspect a simple game-theory result will yield something close to the scheme in my first comment as being "optimal" in an appropriate way. – cardinal Feb 19 '13 at 20:26
  • 1
    Just to improve the cross-linking, it might be worth noting that this question appears to have originally been motivated by Neil's (now-deleted) answer to this one. – cardinal Feb 19 '13 at 20:30
  • @MichaelLugo: If p is known, I believe the optimal result is arithmetic encoding, which has expected runtime of two flips except when p is $0.5$. – Neil G Nov 03 '14 at 06:11
  • 4
    Hello, Giorgio1927, and welcome to the site! Please add the "self-study" tag to this question, as it lets people see that they should guide you to the answer rather than simply providing one. – jbowman Jan 10 '18 at 18:26

3 Answers3

7

This is a well-known problem with several nice solutions which have been discussed here and in stackoverflow (it seems like I cannot post more than one link but a quick google search gives you some interesting entries). Have a look at the wikipedia entry

http://en.wikipedia.org/wiki/Fair_coin#Fair_results_from_a_biased_coin

6

This is a classic problem, I believe attributed originally to von Neumann. One solution is to keep tossing the coin in pairs until the pairs are different, and then defer to the outcome of the first coin in the pair.

Explicitely let $(X_i,Y_i)$ be the outcome of toss $i$, with $X_i$ being the first coin, and $Y_i$ being the second coin. Each coin has probability $p$ of heads. Then $P(X_i=H|X_i\neq Y_i)=P(X_i=T|X_i\neq Y_i)$ due to symmetry, which implies $P(X_i=H|X_i\neq Y_i)=1/2$. To explitely see this symmetry note that $X_i\neq Y_i$ implies the outcomes are $(H,T)$ or $(T,H)$, both of which are equally likely due to independence.

Empirically, the waiting time until such an unequal pair is

$$1/P(X\neq Y)=\frac{1}{1-p^2-(1-p)^2}=\frac{1}{2p(1-p)},$$

which blows up as $p$ gets closer to 0 or 1 (which makes sense).

Alex R.
  • 13,897
2

I'm not sure how to sum up the terms efficiently, but we can stop whenever the total number of rolls $n$ and the total number of successes $t$ are such that $\binom{n}{t}$ is even since we can partition the different orderings that we could have achieved $n$ and $t$ into two groups of equal probability each corresponding to a different outputted label. We need to be careful that we haven't already stopped for these elements, i.e., that no element has a prefix of length $n'$ with $t'$ successes such that $\binom{n'}{t'}$ is even. I'm not sure how to turn this into an expected number of flips.

To illustrate:

We can stop at TH or HT since these have equal probability. Moving down Pascal's triangle, the next even terms are in the fourth row: 4, 6, 4. Meaning that we can stop after rolls if one heads has come up since we can create a bipartite matching: HHHT with HHTH, and technically HTHH with THHH although we would already have stopped for those. Similarly, $\binom42$ yields the matching HHTT with TTHH (the rest, we would already have stopped before reaching them).

For $\binom52$, all of the sequences have stopped prefixes. It gets a bit more interesting at $\binom83$ where we match FFFFTTFT with FFFFTTTF.

For $p=\frac12$ after 8 rolls, the chance of not having stopped is $\frac1{128}$ with an expected number of rolls if we have stopped of $\frac{53}{16}$. For the solution where we keep rolling pairs until they differ, the chance of not having stopped is $\frac{1}{16}$ with an expected number of rolls if we have stopped of 4. By recursion, an upper bound on the expected flips for the algorithm presented is $\frac{128}{127} \cdot \frac{53}{16} = \frac{424}{127} < 4$.

I wrote a Python program to print out the stopping points:

import scipy.misc
from collections import defaultdict


bins = defaultdict(list)


def go(depth, seq=[], k=0):
    n = len(seq)
    if scipy.misc.comb(n, k, True) % 2 == 0:
        bins[(n,k)].append("".join("T" if x else "F"
                                   for x in seq))
        return
    if n < depth:
        for i in range(2):
            seq.append(i)
            go(depth, seq, k+i)
            seq.pop()

go(8)

for key, value in sorted(bins.items()):
    for i, v in enumerate(value):
        print(v, "->", "F" if i < len(value) // 2 else "T")
    print()

prints:

FT -> F
TF -> T

FFFT -> F
FFTF -> T

FFTT -> F
TTFF -> T

TTFT -> F
TTTF -> T

FFFFFT -> F
FFFFTF -> T

TTTTFT -> F
TTTTTF -> T

FFFFFFFT -> F
FFFFFFTF -> T

FFFFFFTT -> F
FFFFTTFF -> T

FFFFTTFT -> F
FFFFTTTF -> T

FFFFTTTT -> F
TTTTFFFF -> T

TTTTFFFT -> F
TTTTFFTF -> T

TTTTFFTT -> F
TTTTTTFF -> T

TTTTTTFT -> F
TTTTTTTF -> T
Neil G
  • 15,219
  • When $p$ is unknown, any solution has to work for limiting values of $p\to 0$ and $p\to 1$. This should make it clear that @Cardinal's solution is optimal. The expected number of flips (given the unknown $p$, of course) is $2/((p(1-p))$. – whuber Nov 03 '14 at 16:05
  • @whuber: I don't see why it should be optimal. My solution stops in all the same cases as his. However, he will continue to roll after tthh for example, whereas it's possible to stop. – Neil G Nov 03 '14 at 22:04
  • What is your solution? I do not see one described here. I think we might have different concepts of @Cardinal's solution. I understand it to mean "stop whenever the number of heads equals the number of tails and map that to the value of the first result in the sequence." – whuber Nov 03 '14 at 22:16
  • @whuber: You mean this: "A simple solution is to flip the coin two times: If it's HT map it to heads, if it's TH map it to tails. Otherwise, repeat the experiment until one of these two is achieved." This will not stop for HHTT. It waits for a pair HT or TH on an even index. – Neil G Nov 03 '14 at 22:52
  • My solution is to find a bipartite matching of equiprobable prefixes (none of which is a prefix another) and associate each half of the matching with either heads or tails. – Neil G Nov 03 '14 at 22:53
  • That's not what I meant, but I see that's how @Cardinal's solution could be read. – whuber Nov 03 '14 at 23:05
  • @whuber: In any case, your solution won't stop for TTTH so it's not optimal either :) – Neil G Nov 03 '14 at 23:28
  • I don't follow. Perhaps you could provide a small example to help us understand what precisely your solution is and why it works correctly for all values of $p$? – whuber Nov 04 '14 at 07:39
  • @whuber Please let me what I can do to make my answer clearer. I know that I haven't shown that if $n \choose k$ is even, so is ${n \choose k} - {n' \choose k'} - \dotsb$, even though I think it's true. – Neil G Nov 04 '14 at 08:07
  • I fail to see how the evenness of the binomial coefficients is relevant or even correct, so I must be misunderstanding the algorithm. Ignoring for the moment that fact that the experiment will have stopped at HT or TH, consider the probabilities upon stopping after four tosses at an even binomial coefficient: they are $4p(1-p)^3$, $6p^2(1-p)^2$, and $4p^3(1-p)$. How do you arrange those to provide equiprobable results of heads and tails regardless of the (still unknown) value of $p$? (Given that you stopped at HT or TH, replace each binomial coefficient by $2$: but the question stands). – whuber Nov 04 '14 at 08:13
  • @whuber: The idea is to look at sets of sequences each set member having the same elements in a different order (and therefore whose realization probabilities are the same). If the set size is even, we can partition it into two groups having equal probability, but different labels. I've updated the program and copied its output to the answer. – Neil G Nov 04 '14 at 08:46
  • Your approach is now very clear; thank you for those elaborations. It is a very interesting solution (+1) and seems likely to be the best possible; the trick lies in proving that! – whuber Nov 04 '14 at 15:56