4

I was solving a puzzler and I came across an identity that I've never seen before. I consulted a couple stats experts and textbooks and have come up empty. I'm wondering if anyone has ever seen this before.

Let $f(k, n, p)$ and $F(k, n, p)$ be the PMF and CDF of the binomial distribution, respectively. Then it appears that:

$F(k, 2k+1, p) = F(k, 2k+2, p) + \frac{1}{2} f(k+1, 2k+2, p)$

Haven't formally proved this. I'm guessing it's not too difficult; maybe needs some of the binomial coefficient recurrence identities. But I just wanted to ask if anyone has come across this before or if it's a well-known identity.

Thanks!

2 Answers2

4

What a lovely opportunity to one-up some other statistics experts! The formula you are looking at is a special case of a more general identity for the binomial distribution, in the box below. Taking $n=2k+1$ gives the specific result you are looking at.

$$\boxed{\quad F(k,n,p) = F(k,n+1,p) + \frac{k+1}{n+1} \cdot f(k+1,n+1,p). \quad}$$

Incidentally, this identity is extremely useful for examining the stochastic ordering properties of a binomial random variable. Specifically, since this identity implies that $F(k,n+1,p) \geqslant F(k,n,p)$ (with strict inequality for $p>0$) it shows that the sequence of random variables $X_n \sim \text{Bin}(n,p)$ is stochastically nondecreasing in $n$ (and stochastically increasing in $n$ if $p>0$).


Derivation: Using Pascal's recurrence identity we can establish the general identities:

$$\begin{align} F(k,n+1,p) &= \sum_{i=0}^k {n+1 \choose i} p^i (1-p)^{n+1-i} \\[6pt] &= \sum_{i=0}^k \bigg[ {n \choose i} + {n \choose i-1} \bigg] p^i (1-p)^{n+1-i} \\[6pt] &= \sum_{i=0}^k {n \choose i} p^i (1-p)^{n+1-i} + \sum_{i=1}^k {n \choose i-1} p^i (1-p)^{n+1-i} \\[6pt] &= (1-p) \sum_{i=0}^k {n \choose i} p^i (1-p)^{n-i} + p \sum_{i=1}^k {n \choose i-1} p^{i-1} (1-p)^{n-(i-1)} \\[6pt] &= (1-p) \sum_{i=0}^k {n \choose i} p^i (1-p)^{n-i} + p \sum_{i=0}^{k-1} {n \choose i} p^{i} (1-p)^{n-i} \\[12pt] &= (1-p) F(k,n,p) + p F(k-1,n,p), \\[20pt] f(k+1,n+1,p) &= {n+1 \choose k+1} p^{k+1} (1-p)^{n-k} \\[6pt] &= \frac{n+1}{k+1} {n \choose k} p^{k+1} (1-p)^{n-k} \\[6pt] &= \frac{n+1}{k+1} \cdot p {n \choose k} p^{k} (1-p)^{n-k} \\[6pt] &= \frac{n+1}{k+1} \cdot p f(k,n,p). \\[6pt] \end{align}$$

We can combine these results to get:

$$\begin{align} F(k,n+1,p) &= (1-p) F(k,n,p) + p F(k-1,n,p) \\[12pt] &= F(k,n,p) - p [ F(k,n,p) - F(k-1,n,p) ] \\[12pt] &= F(k,n,p) - p f(k,n,p) \\[12pt] &= F(k,n,p) - \frac{k+1}{n+1} \cdot f(k+1,n+1,p), \\[12pt] \end{align}$$

which can be rewritten as the boxed formula above.

Ben
  • 124,856
  • Thank you! I had a suspicion it was a special case of a more general identity. Definitely don't remember coming across this in my undergrad probs and stats class. :-) – Bill Woessner Jan 27 '22 at 20:56
  • It only really comes up if you look at stochastic dominance properties of the distribution, which is something that people rarely examine. – Ben Jan 27 '22 at 21:17
3

There is a simple probability argument. It minimizes the math and intuitively shows why such an equality exists which does not depend on $p.$

In the following I will employ just two basic rules of probability: conditional probabilities multiply and mutually exclusive probabilities add. You all know what these vague (but memorable) phrases mean, so I won't belabor the details.

For any $n\ge 0,$ consider a sequence of $n+1$ Bernoulli$(p)$ observations $X_1, X_2, \ldots, X_{n+1}$ ("coin flips"). By definition, each random variable has a chance $p$ to equal $1$ and otherwise equals $0.$

$F(k,n)$ is the chance that $k$ or fewer of the first $n$ outcomes equal $1.$ This can happen in two ways:

  1. $k$ or fewer of all $n+1$ outcomes equal $1.$ The chance of this is $\color{blue}{F(k,n+1)}.$

  2. Exactly $k+1$ of the outcomes equal $1$ (the chance of this is written $\color{red}{f(k+1,n+1)})$ and the final outcome is $1$ (making $k$ of the first $n$ outcomes equal to $1$).

When the $X_i$ are independent (or just exchangeable), all re-orderings of their outcomes are equally likely. In such cases we may therefore reconstruct event $(2)$ by starting with a fixed sequence of $k+1$ ones and $n-k$ zeros and then distributing the $X_i$ randomly within this sequence. Any given $X_i$ thereby (obviously!) has a chance $(k+1)/(n+1)$ of landing on a $1.$ Consequently, the chance of $(2)$ is $\color{red}{f(k+1,n+1)}$ times $(k+1)/(n+1).$

$(1)$ and $(2)$ are mutually exclusive because you cannot simultaneously observe $k$ and $k+1$ ones. Moreover, they are exhaustive because there is no other way $k$ or fewer of the first $n$ outcomes can happen. Thus $F(k,n)$ must equal the sum of the chances of $(1)$ and $(2):$

$$F(k,n) = \color{blue}{F(k,n+1)} + \frac{k+1}{n+1} \color{red}{f(k+1,n+1)}.$$

The particular value $n=2k+1$ answers the question, showing that the "$1/2$" in its formula comes from $(k+1)/(2k+1+1) = 1/2.$

The analysis of case $(2)$ did not need to refer to the probability $p.$ This is why $p$ does not appear in the formula.

whuber
  • 322,774