6

A coin with probability of heads $p \in (0, 1) $ is tossed $n$ times. What is the joint probability distribution of the number of runs and the longest run?

Schilling (1990) discusses the distribution of the longest run but I haven't found a good source for the joint distribution of the longest run and the number of runs. I also found this post at Mathematics Stack Exchange.

As an illustration, I've randomly generated 10 coin tosses with a fair coin ($p=0.5$) and got the following sequence: $\text{TTHTHHTTHT}$. The longest run is 2 and the number of runs is 7.

Here is a simulation with 5000 repeats using $n=100$ and $p=0.5$:

Simulation

Here is the R code I used for the simulations and the graphic:

library(ggplot2)
library(grid)

res.frame <- data.frame( noruns = numeric(0) , longestrun = numeric(0) )

n_sims <- 5000 set.seed(142857)

for (i in seq_len(n_sims)) { x <- rbinom(100, 1, 0.5) y <- rle(x) res.frame[i, "noruns"] <- length(y$lengths[y$length]) res.frame[i, "longestrun"] <- max(y$lengths) }

Plot the data

theme_set(theme_bw()) p <- ggplot(res.frame, aes(x = noruns, y = longestrun)) + geom_point(position = position_jitter(width = 0.2, height = 0.27), size = 0.5) + labs( x = "Number of runs" , y = "Longest run" ) + scale_y_continuous(breaks = seq(2, 20, 2)) + scale_x_continuous(breaks = seq(20, 100, 5)) + theme( plot.title=element_text(size=30), axis.title.y=element_text(colour = "black", size = 17, hjust = 0.5, margin = margin(0,12,0,0)), axis.title.x=element_text(colour = "black", size = 17, margin = margin(10,0,0,0)), axis.text.x=element_text(size=17, angle=0, hjust=0.5, vjust=1), axis.text.y=element_text(size=17), legend.position="none", plot.margin=unit(c(0.2, 0.2, 0.7, 0.2),"cm"), strip.text.x = element_text(size = 20),

panel.grid.minor=element_blank(),

panel.grid.major=element_blank(),

strip.background=element_rect(fill="white") )

p

COOLSerdash
  • 30,198

2 Answers2

2

For the special case of $p=0.5$, the distribution is described by restricted integer compositions (see this post). $$ F(n,k,w)=2^{1-n}\sum_{j=0}^k(-1)^j\binom{k}{j}\binom{n-jw-1}{k-1} $$ where $n$ is the sequence length, $k$ is the number of runs, and $w$ is the maximum run length.

An R implementation:

fn <- function(n) {
  # using formulas for restricted integer compositions, returns the distribution
  # of maximum run length (rows) and number of runs (columns) of
  # Bernoulli sequences of length n with p = 0.5
  # (p. 441, formula E, http://www.fq.math.ca/Scanned/14-5/abramson.pdf)
  if (n < 3L) return(diag(1/n, n)[,n:1, drop = FALSE])
  n1 <- n - 1L
  idx <- 2:n1
  m <- matrix(0, n, n)
  ones <- c(1, -1)
  coeff <- outer(1:n, 0:n, choose)
  lens <- outer(idx, idx, function(w, k) pmin(k + 1L, ((n - k) %/% w) + 1L))
  for (w in idx) {
    for (k in idx) {
      w1 <- w - 1L
      k1 <- k - 1L
      m[w, k] <- sum(coeff[k, seq_len(lens[w1, k1])]*coeff[seq.int(n1, by = -w, length.out = lens[w1, k1]), k]*rep(ones, ceiling(lens[w1, k1]/2))[seq_len(lens[w1, k1])])
    }
  }
  m[idx, idx] <- m[idx, idx] - m[1:(n - 2L), idx]
  m[1, n] <- m[n, 1] <- 1L
  m/2^n1
}
jblood94
  • 1,459
1

We can calculate this distribution for any $p$ in terms of the function $T(k,m,n)$ which counts the positive integer sequences of length $k$, maximum at most $m$, and sum $n$.

The probability of having $2g+h$ runs (with $h\in\{0,1\}$), of maximum length $m$ in a sequence of $N$ coin flips with probability $p$ on each flip is:

$$ \sum_{j=1}^n \begin{cases} \ \ \ \ T(g+h,m\phantom{-1}\ ,j)\,T(g,m\phantom{-1}\ ,N-j)\\ -\ T(g+h,m-1,j)\,T(g,m-1,N-j)) \end{cases} {\Large\}}\left(p^j(1-p)^{N-j}+p^{N-j}(1-p)^j\right) $$

For the probability mentioned in the question, this would be summed over $1\le g \le N/2$ and $0\le h\le 1$.

The expression can be broken down as follows:

  • $T(g+h,m,j)$ counts the sequences of run lengths for whichever side came up first, if those $g+h$ lengths are all at most $m$ and sum to $j$.
  • $T(g,m,N-j)$ counts the sequences of run lengths for whichever side came up second, if those $g$ lengths are all at most $m$ and sum to $N-j$.
  • Their product counts sequences of all run lengths with $2g+h$ runs, all at most $m$ and summing to $N$.
  • To count sequences of run lengths with maximum of exactly $m$, we subtract the product for $m-1$.
  • Then we multiply by the probability of getting a sequence with the heads and tails specified according to those run lengths.

Finally, we can calculate $T(k,m,n)$ by a recursion: \begin{align} T(1,m,n) &= 1 \text{ if } n \le m\\ T(1,m,n) &= 0 \text{ if } n > m\\ T(k,m,n) &= \sum_{i=1}^{\min(m,n-1)} T(k-1,m,n-i)\ \ \text{ if }k > 1 \end{align} The last equation counts all the possibilities with a last term of $i$.

So the overall probability comes from calculations of $T$'s with $1\le k\le \lceil N/2\rceil$, two values of $m$ and $1\le n\le N$, and involves at most ~$N^2$ calculations of $T$'s.

Matt F.
  • 4,726