28

You are given a biased coin. With probability $p$ it produces heads, where $p$ is in the range $(0,1)$. How can you use this coin to simulate an unbiased fair coin, ie., a coin that produces heads with probability $0.5$?

bobble
  • 10,245
  • 4
  • 32
  • 80
Dmitry Kamenetsky
  • 35,897
  • 5
  • 66
  • 276

7 Answers7

34

One possibility:

Toss twice. If the outcomes are the same discard and start over. If they are different forget the first outcome and take the second as the "fair" one.

This works because:

The outcomes tails-heads and heads-tails have the same probability p(1-p). The "deeper" reason for this is that the coin doesn't change over time and the outcome of a single toss does not depend on history.

EDIT: Inspired by @trolley813's answer here is a way to recycle the rejected entropy:

Remember the rejected pairs and treat HH as the H of a new virtual coin and TT as the T. The new probabilies will be p^2 / (p^2+(1-p)^2) for H and (1-p)^2 / (p^2 + (1-p)^2) for T. To this virtual coin apply the same procedure as above. Note that this procedure can be stacked ad lib. (You'd need an infinite sheet of paper but returns are diminishing exponentially, meaning you can reap most of the benefits on a single sheet of A4 (or letter if you happen to be American) paper.)

loopy walt
  • 21,184
  • 33
  • 95
  • Correct! Could you provide some more information about why this works. – Dmitry Kamenetsky May 17 '21 at 05:07
  • 4
    For reference, this is known as Von Neumann's randomness extractor. – Dmitry Kamenetsky May 17 '21 at 05:09
  • I don't quite understand this answer. If the two cases have the same probability, what does that mean? – Sid May 17 '21 at 05:41
  • 3
    HT and TH both have the same probability, namely $p(1-p)$. This means that our final output is truly unbiased. – Dmitry Kamenetsky May 17 '21 at 05:46
  • Looks like you had the same inspiration as me... in fact, the rot13("vasvavgr fgnpx bs iveghny pbvaf") can be simplified to a pretty easy procedure, where all you need to keep track of is rot13(rknpgyl ubj znal zngpuvat gbffrf lbh tbg orsber n aba-zngpuvat bar.) – Steve May 17 '21 at 08:42
  • @Steve rot13("Lbh'q fgvyy arrq na vasvavgr furrg bs cncre orpnhfr gur ahzore lbh arrq gb erzrzore vf haobhaqrq. Nyfb, sbe n fvatyr fvzhyngrq gbff lbhe fubegphg jvyy or bx, rira snfgre jurer gur fubegphg lbh qrfpevor nccyvrf, ohg nf vg fgnaqf vfa'g vg engure jnfgrshy sbe ercrngrq gbffrf?") – loopy walt May 17 '21 at 09:12
  • 3
    Of course, the closer p is to being exactly 0 or exactly 1, the longer this will take. A purely biased coin that lands on the same side 100% of the time will take an infinite amount of tosses to give you a useable result. – Darrel Hoffman May 17 '21 at 15:11
  • 4
    +1 for correctly nesting parentheses in text. – Jeffrey May 17 '21 at 22:10
  • 3
    @DarrelHoffman 0 and 1 are explicitly ruled out in the question. That a more biased coin will take more tosses is a necessary consequence of the more biased coin's lower entropy. After all, to simulate a fair coin toss you need at least one bit of entropy. – Albert.Lang May 18 '21 at 08:29
7

A straightforward answer (actually, a generalisation of loopywalt's answer):

Toss the coin $n=2^k$ times, and the number ${n \choose h}=\frac{n!}{(h!)(n-h)!}$ will be necessarily even ($n$ is the total number of throws and $h$ is the number of heads) except when all throws produced the same results (i.e. $h=0$ or $h=n$, but the probability of that is $p^n+(1-p)^n$, which tends to zero upon $n\to\infty$ when $0<p<1$.). Since all $n \choose h$ ways of throwing $h$ heads in $n$ throws are equally probable (the order does not matter), you can split all possible outcomes in 2 halves (since ${n \choose h}$ is even), e.g. lexicographically, and assign the result according to which half the actual outcome belongs.

Example:

We got two heads with four throws. The number ${4 \choose 2}=6$ is (of course) even, and we get the 6 (lexicographically ordered) possible outcomes (the numbers of throws which gave heads): $(1, 2)$, $(1, 3)$, $(1, 4)$, $(2, 3)$, $(2, 4)$ and $(3, 4)$. If the actual outcome is among the first 3, we get a "head", otherwise a "tail".

trolley813
  • 11,282
  • 2
  • 25
  • 57
  • 4
    Straightforward? More seriously, is it ever worthwhile, i.e. can you find an example where this method is better (in terms of expected number of tosses) than the simple one? – loopy walt May 17 '21 at 06:01
  • 2
    @loopywalt If you got HHTT, this method would give you a result, while the simple one wouldn't. – Deusovi May 17 '21 at 06:33
  • @Deusovi But the simple one ($k=1$) would often give an answer after 2 tosses, while the $k=2$ method never would, so its expected number of tosses could well be lower. – Jaap Scherphuis May 17 '21 at 07:00
  • 1
    @Deusovi but if you got HTHT (or any of the 8 combinations starting TH or HT), this method would take 4 tosses, while the simple one would give a result in only 2 tosses. My intuition says that for a coin very biased towards H, using n=4 would be of benefit, as it "almost guarantees" that after a potentially long sequence of 'H', the next 'T' will be within a group of 4 that decides the result, with TTTT being vanishingly unlikely, whilst TT is at least plausible at the point of the first T. I doubt n>4 is ever worthwhile. – Steve May 17 '21 at 07:01
  • @Steve This method assumes using least possible $k$, i.e. if 2 tosses did not work, throw 2 more times to get 4, then 4 more to get 8 etc. – trolley813 May 17 '21 at 07:18
  • 2
    It's not that simple: Once you make your method dependent on a stopping criterion you can no longer assume the outcomes to be equal probability. For example all outcomes which would have allowed you to stop earlier now have probability zero. – loopy walt May 17 '21 at 07:25
  • 4
    @trolley813 I don't think you can do "if 2 tosses didn't work throw 2 more times". Given a coin with 90% heads, you would get TT=1%, TH=9%, HT=9%, HH=81%. If you then throw 2 more times after HH, you will end up with HHxx,, or HHHHxxxx, etc, all of which convert to "head" given your lexicographical ordering. So in the end, the results would be TT=1% (T), TH = 9% (T), HT=9% (H), HH=81% (H), for a final of 10% T and 90% H. There may be an ordering other than lexicographical that makes "extending the tosses" work, but you'd have to prove it. – JS1 May 17 '21 at 07:28
  • @JS1 I think I finally got my head around the answer, but it needs to be far better explained in the answer. HHxx doesn't always convert to heads... HHHT and HHTH would need to convert one each to heads and tails, as we've already allocated opposite outcomes to the other possibilities of one T in 4 - HT(HH) and TH(HH) when we stopped early without having the possibility of tossing the other two H from those patterns. – Steve May 17 '21 at 07:40
  • @Steve According to the answer, lexicographical ordering is used. The lexicographical ordering of (4 3): (123),(124),(134),(234). HHHT is (123) and HHTH is (124), both of which are in the "first half" (see example) which gives a final answer of "heads". – JS1 May 17 '21 at 07:45
  • 2
    @JS1, (134),(234) have already been considered, so answer needs to explicitly state these are no longer under consideration at the new decision point, and use a lexicographical ordering of the remaining outcomes. – Steve May 17 '21 at 07:48
  • 1
    @Steve Yes I think that would work, although it takes a bit of effort to construct the list of remaining outcomes. – JS1 May 17 '21 at 08:08
4

Combining the ideas from loopywalt and trolley813's answers, start by:

tossing the coin twice. If they are different forget the first outcome and take the second as the "fair" one.

After that:

continue making pairs of tosses. As soon as any pair differs, take the second as the "fair" one.

If this does not make a decision,

All pairs of tosses are necessarily HH or TT, which can be used to simulate an even more biased coin (HH mapping to H, and TT mapping to T), with the exact same procedure run from these results - e.g. HHTT maps to HT which is read as tails and TTHH maps to TH which is read as heads)

Generalising this to n

any sequence except a continuous run of H or T will be resolved within $2^n$ tosses. The point at which it is resolved will map to a final sequence of HT or TH either for the original coin, or for one of the layers of "virtual coins".

There is then also the possibility of further shortening the sequence, for example:

After 6 tosses resulting in HHHHTT, the next 2 tosses cannot fail to make a decision, and the next toss is completely irrelevant.
HHHHTTHH => HHTH (level 1 virtual coin) => "heads" (final toss of non-matching final pair)
HHHHTTHT => "tails" (final toss of non-matching final pair)
HHHHTTTH => "heads" (final toss of non-matching final pair)
HHHHTTTT => HHTT (level 1 virtual coin) => HT (level 2 virtual coin) => "tails"(final toss of non-matching final pair)

As such the 7th toss in this case is completely irrelevant, and could be skipped, allowing a decision with 1 fewer toss. In effect, we can immediately compress the HHHHTT sequence to HHT, upgrading level 1 virtual coins to "real" past tosses.

A simplified procedure for this seemed to suggest itself:

  1. Toss the coin twice.

  2. If the two tosses are different, ignore the first one, and the second is the final result.

  3. While all tosses are identical keep tossing until you get the opposite outcome.

  4. If you've done an EVEN number of tosses at the point you get a difference, the final coin toss is your result.

  5. Toss once more to make the number of tosses even. If it differs from the last toss, that final coin toss is your result.

  6. All tosses are in pairs of identical tosses. Compress to a sequence half the length, which has itself just had a toss of "opposite outcome", and go to step 4.

However, it seems the "compression" is over-aggressive, and may not work in certain cases:

Consider a sequence HHHHHHHHTT, which the above procedure prematurely compresses to HHHHT.
Rather than terminating early, we roll 6 more times, creating:
HHHHHHHHTTabcdef
The final result will necessarily be one of b, d, f, and the procedure will certainly terminate by then. HHHHHHHHTTHHxxxx => HHHHTH => "heads" (b)
HHHHHHHHTTHTxxxx => "tails" (b)
HHHHHHHHTTTHxxxx => "heads" (b)
HHHHHHHHTTTTHHHH => HHHHTTHH => HHTH => "heads" (f)
HHHHHHHHTTTTHHHT => "tails" (f)
HHHHHHHHTTTTHHTH => "heads" (f)
HHHHHHHHTTTTHHTT => HHHHTTHT => "tails" (f)
HHHHHHHHTTTTHTxx => "tails" (d)
HHHHHHHHTTTTTHxx => "heads" (d)
HHHHHHHHTTTTTTHH => HHHHTTTH => "heads" (f)
HHHHHHHHTTTTTTHT => "tails" (f)
HHHHHHHHTTTTTTTH => "heads" (f)
HHHHHHHHTTTTTTTT => HHHHTTTT => HHTT => HT "tails" (f)
Toss 'e' is completely irrelevant, but tosses 'a' and 'c' seem to be needed to determine whether we take 'b' 'd' or 'f' as the final result.

As such a more complicated compression function seems to be required (which I'd thought of before, and prematurely over-simplified it to the one above, missing a critical step in the logic):

  1. Keep tossing the coin until the first point you've seen BOTH outcomes. WLOG, assume that's a long sequence of 'H' followed by a single 'T'.

  2. If the total number of tosses is even, stop, with the result as the final toss (WLOG that's tails).

  3. Toss once more to get an even number. WLOG, if the extra toss is 'H' matching the sequence, the final result is "heads".

  4. If total number of tosses is divisible by 4, stop with the result as the final toss (either of the two identical tosses which differ from the rest of the sequence).

  5. If total number of tosses is 6 (mod 8), toss ONCE more, and return that as the final result (either due to not-matching the skipped coin, or not-matching one of the preceding runs)

  6. Total number of tosses is now 2 (mod 8). Toss TWICE more, if the two results differ, return the second as the final result. If they are both 'H' (matching the long sequence), return THAT as the final result.

  7. Total number of tosses is now 4 (mod 8) with the final 4 all 'T' and at least 8 'H' preceding it. Toss the coin twice more, and if the two results differ, return that as the final result.

  8. Now the total number of tosses is 6 (mod 8), with 6 identical tosses preceded by at least 8 opposite tosses. If the total is 14 (mod 16), toss ONCE more and return that as the final result.

  9. Otherwise, WLOG we have a sequence of 16 H followed by 8 T. This can be treated as equivalent to a long sequence of T for the next few flips. It will remain similarly unresolved for any sequence of (e.g.) $2^{n+1}$ H followed by $2^n$ T, then perhaps $2^(n-1)$ H, etc. in an alternating sequence.

However note that the worst case to reach before this "compression function" can be invoked is is

$2^k - 2$ tosses, where the first $2^{k-1}$ are identical, and the remainder contain at least one precisely-aligned run of $2^j$ of the opposite face (with $1 \le j < k$)

This does indeed then create almost as much potential book-keeping as the uncompressed version, although it's possible that a more efficient compression function (or possibly a better mathematical justification for the original one if it was "right by co-incidence") could be devised.

Steve
  • 3,865
  • 11
  • 34
  • Nice. I wonder what the average number of tosses is needed to reach an outcome? Is this the optimal we can do? – Dmitry Kamenetsky May 17 '21 at 08:49
  • 1
    @DmitryKamenetsky there will be more optimal procedures for cases where the value of p is known precisely. For example if $p^2$ or $(1-p)^2$ is precisely 0.5, a decision can be made within 2 tosses (and may be decided in 1 toss if the less likely outcome occurs first) – Steve May 17 '21 at 08:54
  • 1
    It's a bit more complicated than the steps make it seem since the virtual coins in the compressed sequence do not have the same bias as a single coin. So if you've compressed the sequence in step 6, and step 4 give no immediate result, in step 5 you have to toss an extra virtual coin, i.e. toss the real coin twice. It's tricky to write down the recursivity correctly, especially when there can be several compression steps. – Jaap Scherphuis May 17 '21 at 09:06
  • @JaapScherphuis as I read it the shortcut of "skipping the last but one toss" is meant to be used here. I'm unsure, though, whether that works over multiple recursion levels. – loopy walt May 17 '21 at 09:16
  • @JaapScherphuis I think you're right that I over-simplified before committing to writing. Updated with a more complicated procedure. A better answer is available if we make use of the fact that $p$ is stated to be known though, but the specific details of the answer will depend on the actual value of $p$. – Steve May 17 '21 at 10:48
  • I am interested to know what we can do when $p$ is known? – Dmitry Kamenetsky May 17 '21 at 11:52
  • 1
    @DmitryKamenetsky I added another answer covering that case. – Steve May 17 '21 at 15:02
3

Similar to the previous and accepted answer:

keep track of
- the number of throws (N)
- the last number of subsequent heads/tails (L)

Start throwing the coin
- if N is a multiple of 2 and L = 1: use the last result
- if N is a multiple of 4 and L = 2: use the last result
- if N is a multiple of 8 and L = 4: use the last result
- if N is a multiple of 16 and L = 8: use the last result
- ... (add similar lines as long as you can calculate faster than you can throw more)

Retudin
  • 8,511
  • 1
  • 13
  • 46
3

[An earlier version of this answer had the wrong formulae - these are now corrected]

Assuming that $p$ is known, we can proceed as follows:

  1. Initialise $t_1 = 0.5$ (target probability)

  2. Flip the coin for the nth time.

  3. If coin landed H, set $t_{n+1} = t_n/p$, otherwise, set $t_{n+1} = (t_n-p)/(1-p)$

  4. If $0 < t_{n+1} < 1$ go back to step 2.

  5. the coin shows the final result, as does the sign of the value of $t_{n+1}$ (0 counts as negative).

This procedure is optimal where $p$ is known, and in particular will always resolve in a single toss when $p = 0.5$.

Worked example for $p = 0.2$:

1:H => $t_2 = 0.5 / 0.2 = 2.5 \ge 1$ : immediate result
1:T => $t_2 = (0.5 - 0.2) / 0.8 = 0.375 \not\le 0$ : continue to 2
2:H => $t_3 = 0.375 / 0.2 = 1.875 \ge 1$ : immediate result
2:T => $t_3 = (0.375 - 0.2) / 0.8 = 0.21875 \not\le 0$ : continue to 3
3:H => $t_4 = 0.21875 / 0.2 = 1.09375 \ge 1$ : immediate result
3:T => $t_4 = (0.21875 - 0.2) / 0.8 \approx 0.02344 \not\le 0$ : continue to 4
4:H => $t_5 = 0.02344 / 0.2 \approx 0.11719 \not\ge 1$ : continue to 5
4:T => $t_5 = (0.02344 - 0.2) / 0.8 \approx -0.2207 \le 0$ : immediate result
5:H => $t_6 = 0.11719 / 0.2 \approx 0.5859 \not\ge 1$ : continue to 6
5:T => $t_6 = (0.11719-0.2)/ 0.8 \approx -0.1035 \le 0$ : immediate result
...
(in this table, a head in any of the first 3 flips should be kept, as $p$ is so low, that even this doesn't reach 0.5 probability, but after 3 flips without a head, the next flip should be kept only if it's a tail, etc...)
The procedure will terminate in a finite time with probability 1, and for a countably infinite number of values of $p$ has an upper bound to the number of flips. I think that the "average" number of flips required would be inversely proportional to $p(1-p)$, and 2 for an "almost precisely fair" coin (where each flip has a $(0.5 \pm \epsilon)$ chance of terminating the procedure).

For a known $p$

a table can be pre-generated up to any finite number of flips, stating for each flip whether H or T should be kept.

This can in principle be inverted,

by generating such a table (in effect a binary sequence), and then calculating what value of $p$ that corresponds to. Maths for that left as an exercise for the reader... there would also be a 1:1 correspondence between any possible value of $p$ and the (potentially infinite) binary sequence representing this table.
As an example of other special values where the sequence of rolls is always finite, for $p = \frac1{\sqrt2}$ or $p = 1 - \frac1{\sqrt2}$, the fair coin can be simulated precisely with 2 tosses, corresponding with the solutions of $\frac{\frac{0.5}p-p}{1-p}=0$ and $\frac{\frac{0.5-p}{1-p}-p}{1-p}=0$. There are 4 special values of $p$ for which the fair coin would be simulated with no more than 3 tosses, etc.

Similar sequences would more commonly be used

when you have a "fair coin", and want to simulate some other probability, or for the mathematical basis of Huffman coding or Arithmetic coding.

It can be generalised too, for example to simulate the roll of a fair die

use an array of starting thresholds from 1/6 to 5/6 instead of a single threshold at 0.5, and iterate until all thresholds are either $\le 0$ or $\ge 1$. Similar techniques can convert any known-probability event to any other.

If you want to guarantee that an arbitrary (but known) $p$ will still generate a result within a finite number of rolls, given the practical constraints (e.g. of floating point numbers, or of the required precision for the particular simulation):

Replace the calculations involving $t_1 = 0.5$ with (e.g.) $t_{1h} = 0.500000000001$ and $t_{1l} = 0.499999999999$. The difference between the "upper" and "lower" values of $t_n$ would increase with each iteration, guaranteeing a result within given tolerance of a fair coin and within a well-defined bounded number of tosses.

Steve
  • 3,865
  • 11
  • 34
  • Wow that's very cool! – Dmitry Kamenetsky May 17 '21 at 16:21
  • Steve there seems to be a problem with this method. I have implemented it in Java and compared it to the original method that doesn't know the value of $p$. For some $p$ your method does not give an unbiased result and sometimes the simple method gives less steps. I am not sure what the problem is. Here is my code: https://ghostbin.co/paste/ax6f5 – Dmitry Kamenetsky May 18 '21 at 06:24
  • 1
    @DmitryKamenetsky I found an issue with the formulae - I'd got them wrong at first, and then hurriedly made a couple of incorrect "corrections" that happened to have almost the right effect for the first 5 steps or so of the worked example! Re-checking working, it should be that the new value of t goes below 0 for "tails" rather than above 1. – Steve May 18 '21 at 07:46
  • Yes that works now! And it gives fewer steps than the simple method. I also had a bug in my code. Here is the new one: https://ghostbin.co/paste/8zzj57 – Dmitry Kamenetsky May 18 '21 at 08:38
  • @DmitryKamenetsky it looks like your code is using something like $p = 0.499999999...$ for the final check due to accumulated floating point rounding errors in the for loop. (If it was calling with $p = 0.5$, which is represented precisely in binary, it's a special case and the loop within generateKnowingP would always terminate in the first iteration). – Steve May 18 '21 at 09:23
  • The code is correct and it does terminate in one step, but I forgot to update the output at the bottom. – Dmitry Kamenetsky May 18 '21 at 09:29
  • @DmitryKamenetsky the workaround || Math.abs(t)<1e-9 || Math.abs(t-1)<1e-9 in your code doesn't change the fact that $p \ne 0.5$ by a very small amount! In fact the $steps \approx 2.000$ is correct behaviour for the algorithm for $p \approx 0.5$ (but not exactly equal). – Steve May 18 '21 at 09:32
  • I think the problem is in the outer loop where p accumulates small errors. – Dmitry Kamenetsky May 18 '21 at 13:22
1

My friend Adam came up with a new solution while we were hiking up a steep mountain. I want to post it here, because it attacks the problem from a different angle.

Toss the biased coin 100 times and count how many heads came up, call it $a$. Now toss it another 100 times and count how many heads came up, call it $b$. If $a>b$ then output "H", if $a<b$ then output "T", otherwise if $a=b$ do nothing. Both outcomes ($a>b$ and $a<b$) are equally likely, so "H" and "T" will be generated with the same probability. Now if we toss the coins just once then this method becomes identical to the accepted answer - "HH" and "TT" will be skipped as the counts are the same, while "HT" will produce "H" and "TH" will produce "T" with equal probability. I thought it is quite neat how a method that looks completely different initially ends up being the same.

Dmitry Kamenetsky
  • 35,897
  • 5
  • 66
  • 276
0

I'm pretty simple, the other answers seem similar but don't make much sense to me.

I believe my solution is simple and should work

Flip the biased coin, if it's Heads then you add one to heads Flip the biased coin again, if it's Heads then you add one to tails Repeat. So just invert the result of every second dice result.

Aequitas
  • 107
  • 4
  • 1
    Welcome to PSE! Your answer is not enough clear to me: it seems like you are just counting the outcomes of the biased coin rather than simulating an unbiased one. In your solution what are the outcomes of the simulated biased coin? – melfnt May 19 '21 at 07:41
  • 1
    This answer generates Head with average probability 0.5, but the different coins aren't identically distributed - if the biased coin has probability 0.01 for H you will likely generate THTHTHTH. It's fast and has correct average, but not random. – Hans Olsson May 19 '21 at 09:11
  • @melfnt no i meant that every second dice throw you count the opposite of whatever the dice shows, therefore I think it should negate the bias – Aequitas May 19 '21 at 14:00
  • @HansOlsson that was just an example, I meant you just inverse every second dice throw, if you roll heads then heads again, you would count heads and tail, if you rolled heads and then tails you would count two heads. So assuming the biased die is 0.1 Heads and 0.9 tails; the first roll would be 0.1H0.9T, the 2nd roll would be 0.9H and 0.1T. so (0.9+0.1)/2 = 0.5 for each outcome. – Aequitas May 19 '21 at 14:03
  • @Aequitas it does not, see Hans Olsson's comment above – melfnt May 19 '21 at 14:03
  • The reason this answer doesn't work is a really interesting peep into what probability 0.5 really means: the fact that you can achieve an overall proportion of H around 0.5 doesn't mean that H has a probability of 0.5 to come out at any given toss. If the coin is initially biased as 0.99 vs 0.01 then your method will output the sequence HHHH very rarely while it should occur about 1/16th of the time. – Arnaud Mortier May 19 '21 at 14:49
  • 2
    Ah, thanks everyone I think I now understand the flaw in my solution, and also it helped me understand why the accepted solution works. It's similar to mine in that it takes two opposite results, the difference is I force the 2nd result to be opposite, whereas the other one just waits until there happens to be different results in a row, though it also means that the biased and unbiased result can be in either order, so just take the 2nd (or I guess first) and you get your random generation. – Aequitas May 19 '21 at 23:07