19

Given:

  1. A coin with unknown bias $p$ (Head).
  2. A strictly positive real $a > 0$.

Problem:

Generate a random Bernoulli variate with bias $p^{a}$.

Does anyone know how to do this? For instance, when $a$ is a positive integer, then one can flip the coin $a$ times and see whether all the outcomes were Heads: if they are then issue '0', otherwise issue '1'. The difficulty lies in the fact that $a$ is not necessarily an integer. Also, if I knew the bias $p$, I could just build another coin with the desired bias.

Pedro A. Ortega
  • 781
  • 4
  • 13
  • 2
    @gung: I think what's wanted is an algorithm to generate a Bernoulli variate given a coin. – Neil G Feb 18 '13 at 21:31
  • 1
    I think the point here is, when $a>1$ you only keep an average of 1 out of every $a$ heads that pops up and when $a < 1$, you duplicate each of the heads an average of $1/a$ times. – Macro Feb 18 '13 at 21:31
  • 3
    @Macro, could you expand on the idea? – Pedro A. Ortega Feb 18 '13 at 21:36
  • 1
    Dear Pedro, (+1) for your post, which is the kind of question that makes CV very invigorating and stimulating, at least to me. Could I ask what the origin of this question is? – cardinal Feb 19 '13 at 14:11
  • @cardinal: Thanks again for your answer! This problem is a part of a sampler for solving stochastic control problems that I'm working on. The reason why $p$ is unknown is because it would require knowing the normalizing constant (which in this case is a nasty partition function), but we can still sample from it using rejection sampling. Btw, it would be nice to cite you by name, not just the link to CV ;-) . – Pedro A. Ortega Feb 19 '13 at 16:53
  • What do you mean by "coin With bias $p^a$"? – kjetil b halvorsen Apr 10 '17 at 12:21

4 Answers4

20

We can solve this via a couple of "tricks" and a little math.

Here is the basic algorithm:

  1. Generate a Geometric random variable with probability of success $p$.
  2. The outcome of this random variable determines a fixed known value $f_n \in [0,1]$.
  3. Generate a $\mathrm{Ber}(f_n)$ random variable using fair coin flips generated from blockwise paired flips of our $\mathrm{Ber}(p)$ coin.
  4. The resulting outcome will be $\mathrm{Ber}(p^a)$ for any $a \in (0,1)$, which is all we need.

To make things more digestible, we'll break things into pieces.

Piece 1: Without loss of generality assume that $0 < a < 1$.

If $a \geq 1$, then, we can write $p^a = p^n p^b$ for some positive integer $n$ and some $0 \leq b < 1$. But, for any two independent Bernoulli's, we have $$\renewcommand{\Pr}{\mathbb P} \Pr(X_1 = X_2 = 1) = p_1 p_2 \>. $$ We can generate a $p^n$ Bernoulli from our coin in the obvious way. Hence, we need only concern ourselves with generating the $\mathrm{Ber}(p^a)$ when $a \in (0,1)$.

Piece 2: Know how to generate an arbitrary $\mathrm{Ber}(q)$ from fair coin flips.

There is a standard way to do this. Expand $q = 0.q_1 q_2 q_3 \ldots$ in its binary expansion and then use our fair coin flips to "match" the digits of $q$. The first match determines whether we declare a success ("heads") or failure ("tails"). If $q_n = 1$ and our coin flip is heads, declare heads, if $q_n = 0$ and our coin flip is tails, declare tails. Otherwise, consider the subsequent digit against a new coin flip.

Piece 3: Know how to generate a fair coin flip from unfair ones with unknown bias.

This is done, assuming $p \in (0,1)$, by flipping the coin in pairs. If we get $HT$, declare a heads; if we get $TH$, declare a tails, and otherwise repeat the experiment until one of the two aforementioned outcomes occurs. They are equally probable, so must have probability $1/2$.

Piece 4: Some math. (Taylor to the rescue.)

By expanding $h(p) = p^a$ around $p_0 = 1$, Taylor's theorem asserts that $$ p^a = 1 - a(1-p) - \frac{a(1-a)}{2!} (1-p)^2 - \frac{a(1-a)(2-a)}{3!} (1-p)^3 \cdots \>. $$ Note that because $0 < a < 1$, each term after the first is negative, so we have $$ p^a = 1 - \sum_{n=1}^\infty b_n (1-p)^n \>, $$ where $0 \leq b_n \leq 1$ are known a priori. Hence $$ 1 - p^a = \sum_{n=1}^{\infty} b_n (1-p)^n = \sum_{n=1}^\infty b_n \Pr(G \geq n) = \sum_{n=1}^\infty f_n \Pr(G = n) = \mathbb E f(G), $$ where $G \sim \mathrm{Geom}(p)$, $f_0 = 0$ and $f_n = \sum_{k=1}^n b_k$ for $n \geq 1$.

And, we already know how to use our coin to generate a Geometric random variable with probability of success $p$.

Piece 5: A Monte Carlo trick.

Let $X$ be a discrete random variable taking values in $[0,1]$ with $\Pr(X = x_n) = p_n$. Let $U \mid X \sim \mathrm{Ber}(X)$. Then $$ \Pr(U = 1) = \sum_n x_n p_n. $$

But, taking $p_n = p(1-p)^n$ and $x_n = f_n$, we see now how to generate a $\mathrm{Ber}(1-p^a)$ random variable and this is equivalent to generating a $\mathrm{Ber}(p^a)$ one.

cardinal
  • 26,862
  • How can I cite you (or your solution)? – Pedro A. Ortega Feb 19 '13 at 10:04
  • 2
    @Pedro: I suppose you can click on the "share" link at the bottom of this answer. It should be a stable link. Math.SE has a citation mechanism , which doesn't appear to be enabled on this site, but you may be able to adapt it. – cardinal Feb 19 '13 at 14:10
  • 1
    Now, this is a brilliant answer! – Zen Feb 19 '13 at 17:48
  • 1
    I wrote this up in the General Discussion forum of the Coursera class on Analytic Combinatorics since this was a nice use of power series related to some of the material covered there. https://class.coursera.org/introACpartI-001/forum/thread?thread_id=108 – Douglas Zare Feb 21 '13 at 04:34
  • @Douglas: Thanks! Is there a publicly viewable version of that thread or would I need to sign up for the course to see it? Pedro and I have been discussing (via email) possible avenues for including this approach in some of his research. – cardinal Feb 21 '13 at 14:34
  • I'll post it here as community wiki. – Douglas Zare Feb 21 '13 at 16:14
7

Is the following answer silly?

If $X_1,\dots,X_n$ are independent $\mathrm{Ber}(p)$ and $Y_n$ has distribution $\mathrm{Ber}\left(\left(\sum_{i=1}^n X_i/n \right)^a\right)$, then $Y_n$ will be approximately distributed as $\mathrm{Ber}(p^a)$, when $n\to\infty$.

Hence, if you don't know $p$, but you can toss this coin a lot of times, it is possible to sample (approximately) from a $\mathrm{Ber}(p^a)$ random variable.

Example Rcode:

n <- 1000000
p <- 1/3 # works for any 0 <= p <= 1
a <- 4
x <- rbinom(n, 1, p)
y <- rbinom(n, 1, mean(x)^a)
cat("p^a =", p^a, "\n")
cat("est =", mean(y))

Results:

p^a = 0.01234568 
est = 0.012291 
Zen
  • 24,121
  • 2
    I like this answer but I suspect it misses the point of the question, which I interpreted as asking for an algorithm that generates from the requested distribution without knowing $p$ (or empirical information about $p$). But, the problem does presuppose that you can generate ${\rm Bernoulli}(p)$ random variables, so this is a perfectly reasonable answer and isn't silly at all! +1 – Macro Feb 19 '13 at 01:18
  • 1
    +1: I like it. I assume you mean $Y_n$ is distributed…? – Neil G Feb 19 '13 at 01:39
  • Much better! Tks, @Neil G! – Zen Feb 19 '13 at 02:21
  • 1
    This is cute (+1), but we can do it exactly in an almost surely finite number of flips (and, on average, that number will be relatively small). – cardinal Feb 19 '13 at 05:34
6

I posted the following exposition of this question and cardinal's answer to the General Discussion forum of the current Analytic Combinatorics class on Coursera, "Application of power series to constructing a random variable." I'm posting a copy here as community wiki to make this publicly and more permanently available.


There was an interesting question and answer on stat.stackexchange.com related to power series: "How to generate a non-integer amount of consecutive Bernoulli successes?" I'll paraphrase the question and the answer by cardinal.

Suppose we have a possibly unfair coin which is heads with probability $p$, and a positive real number $\alpha$. How can we construct an event whose probability is $p^\alpha$?

If $\alpha$ were a positive integer, we could just flip the coin $\alpha$ times and let the event be that all tosses were heads. However, if $\alpha$ is not an integer, say $1/2$, then this doesn't make sense, but we can use this idea to reduce to the case that $0 \lt \alpha \lt 1$. If we want to construct an event whose probability is $p^{3.5}$, we take the intersection of independent events whose probabilities are $p^3$ and $p^{0.5}$.

One thing we can do is construct an event with any known probability $p' \in [0,1]$. To do this, we can construct a stream of fair bits by repeatedly flipping the coin twice, reading $HT$ as $1$ and $TH$ as $0$, and ignoring $HH$ and $TT$. We compare this stream with the binary expansion of $p' = 0.a_1a_2a_3..._2$. The event that the first disagreement is where $a_i=1$ has probability $p'$. We don't know $p^\alpha$, so we can't use this directly, but it will be a useful tool.

The main idea is that we would like to use the power series for $p^\alpha = (1-q)^\alpha = 1 - \alpha q - \frac{\alpha(1-\alpha)}{2} q^2 - \frac{\alpha (1-\alpha)(2-\alpha)}{3!}q^3 -...$ where $p=1-q$. We can construct events whose probabilities are $q^n$ by flipping the coin $n$ times and seeing if they are all tails, and we can produce an event with probability $p' q^n$ by comparing the binary digits of $p'$ with a fair bit stream as above and checking whether $n$ tosses are all tails.

Construct a geometric random variable $G$ with parameter $p$. This is the number of tails before the first head in an infinite sequence of coin tosses. $P(G=n) = (1-p)^np = q^n p$. (Some people use a definition which differs by $1$.)

Given a sequence $t_0, t_1, t_2, ...$, we can produce $t_G$: Flip the coin until the first head, and if there are $G$ tails before the first head, take the element of the sequence of index $G$. If each $t_n \in [0,1]$, we can compare $t_G$ with a uniform random variable in $[0,1]$ (constructed as above) to get an event with probability $E[t_G] = \sum_n t_n P(G=n) = \sum_n t_n q^n p $.

This is almost what we need. We would like to eliminate that $p$ to use the power series for $p^\alpha$ in $q$.

$$1 = p + qp + q^2p + q^3p + ...$$

$$q^n = q^np + q^{n+1}p + q^{n+2}p + ...$$

$$\begin{eqnarray} \sum_n s_n q^n & = & \sum_n s_n (q^n p + q^{n+1}p + q^{n+2}p + ...) \newline & = & \sum_n (s_0 + s_1 + ... + s_n) q^n p \end{eqnarray}$$

Consider $1-p^\alpha = \alpha q + \frac{\alpha(1-\alpha)}{2} q^2 + ... $. Let $t_n$ be the sum of the coefficients of $q$ through $q^n$. Then $1-p^\alpha = \sum_n t_n q^n p$. Each $t_n\in [0,1]$ since the coefficients are positive and sum to $1-0^\alpha = 1$, so we can construct an event with probability $1-p^\alpha$ by comparing a fair bit stream with the binary expansion of $t_G$. The complement has probability $p^\alpha$ as required.


Again, the argument is due to cardinal.

Douglas Zare
  • 10,658
  • 2
    (+1) Thanks for going to the trouble to post this. The differences in exposition, while relatively slight, help make the approach more clear. – cardinal Feb 21 '13 at 17:11
4

The very complete answer by cardinal and subsequent contributions inspired the following remark/variant.

Let PZ stand "Probability of Zero" and $q:=1-p$. If $X_n$ is an iid Bernoulli sequence with PZ $q$, then $M_n := \max(X_1,\,X_2,\,\dots, X_n)$ is a Bernoulli r.v. with PZ $q^n$. Now making $n$ random i.e., replacing it by an integer rv $N \geq 1$ leads to Bernoulli rv $M_N$ with $$ \mathrm{Pr}\{M_N =0\} = \sum_{n=1}^\infty \mathrm{Pr}\{M_N =0 \,\vert\, N =n\} \mathrm{Pr}\{N =n\} = \sum_{n=1}^\infty \mathrm{Pr}\{N =n\} \, q^n. $$ So if $0 < a < 1$ and if we take $\mathrm{Pr}\{N =n\} =b_n$ from cardinal's answer, we find $\mathrm{Pr}\{M_N =0\} = 1- p^a$ and $1-M_N$ is $\mathrm{Ber}(p^a)$ as wanted. This is indeed possible since the coefficients $b_n$ satisfy $b_n \geqslant 0$ and they sum to $1$.

The discrete distribution of $N$ depends only on $a$ with $0 < a < 1$, recall $$ \mathrm{Pr}\{N =n\} = \frac{a}{n}\,\prod_{k=1}^{n-1}\left(1 - a/k\right) \qquad (n \geq 1). $$ It has interesting features. It turns out to have an infinite expectation and an heavy tail behaviour $n \,b_n \sim c/n^a$ with $c = -1/\Gamma(-a) >0$.

Though $M_N$ is the maximum of $N$ rvs, its determination needs a number of $X_k$ which is $\leq N$ since the result is known as soon as one $X_k$ is $1$. The number of computed $X_k$ is geometrically distributed.

Yves
  • 5,358
  • A related idea would be to make the rvs $X_k$ dependent with extremal index $\theta$ $(0 < \theta < 1)$, meaning that $M_n$ has PZ $q^{n\theta}$ rather than $q^n$. Taking $n\theta = a$ would do the job for any $a>0$. Given a sequence of iid rv.s $X_n$ following a standard Frechet, there are known methods to generate a dependent sequence $X_n^\star$ with standard Frechet margin and the prescribed extremal index $\theta$. However, what happens if we replace standard Frechet'' byBernoulli''? – Yves Feb 24 '13 at 10:11
  • (+1) Very nice, @Yves. A few remarks: (1) The first part can be viewed as the complement of the approach I've taken. In fact, when I first got the series in $b_n q^n$, while I immediately saw the connection to the geometric, I first tried something more direct and didn't come up with a natural way to do it. Your answer solves that problem. (2) Your approach can also be implemented using only a $\mathrm{Ber}(p)$ coin. In particular, $N$ can be generated by descending down a full binary tree based on fair coin flips, where the left nodes are leaves and the decision is made by (...) – cardinal Feb 24 '13 at 17:13
  • (...) comparing the number in $(0,1)$ constructed from the (partial) sequence of coin flips to the partial sums $f_n = \sum_{i=1}^n b_i$. The termination depth gives $N$. (3) I believe that $n b_n \sim c n^{-(1+a)}$, which will change your conclusion regarding the finiteness of the mean. (4) In both your approach and mine, it seems we cannot escape computing $f_n = \sum_{i=1}^n b_i$, even if we allow for uniform random variates in addition to our $\mathrm{Ber}(p)$ coin for the purposes of sampling. Finding a way to avoid that would seem to be the most obvious way to improve efficiency. – cardinal Feb 24 '13 at 17:21
  • 1
    Thank you @cardinal. I agree with all your comments except perhaps (3). I actually made an error since $c$ is $-1/\Gamma(-a)$ (edited), but the exponent of $n$ seems the right one. I used the representation of $\Gamma(z)$ as found e.g. on Wikipedia page on infinite product and took $z:=-a$ which gives an equivalent for the product $\prod_{k=1}^{n-1}$. I would be more confident if you could check this. – Yves Feb 24 '13 at 18:51
  • Dear @Yves, (+1) You are correct about the constant and about (3). My apologies. Somehow, when I went to transcribe things to paper, I ended up focusing on the asymptotics of $b_n$ instead of $n b_n$. :-) – cardinal Feb 24 '13 at 19:18