Detection of 'unfair' Bernoulli sequences using run lengths?

Question

I'm rather confused about the following question:

Which of the following vectors do you think is actually a random sample of $\mathcal{B}(1, 0.5)$ (Bernoulli) of length 10? $[0,1,1,0,1,0,1,0,1,1]$ and $[0,0,1,1,1,1,1,1,1,0]$.

In the solution they argue that the second sequence is highly unlikely as the longest subsequence is of length 7 and highly unlikely.

I disagree with this answer as the likelihood of both sequences is exactly the same and in fact any sequence (including the constant sequence) has the same likelihood under $\mathcal{B}(1, 0.5)$. However, this clearly clashes with my intuition as I would not think that $[1,1,1,1,1,1,1,1,1,1]$ is a random sample of $\mathcal{B}(1, 0.5)$.

Edit: I guess their solution does make sense if we consider the length of the longest subsequence as a summary statistic, and reject if the p value becomes too small. Still feels weird.

Does this answer your question? Why is a 100 heads run surprising? — Arya McCarthy, Mar 20 '22 at 17:10
If a Bernoulli sequence has $p=1/2,$ then a run of 1s as long as seven does seem unlikely. Have you studied distributions of runs? Also, at first glance it may seem unlikely to get seven or more heads in ten tosses of a fair coin, but the actual probability of that is about $0.17> 0.05 = 5%$ [In R, where pbinom is a binomial CDF, code 1 - pbinom(6, 10, .5) returns $0.171875.]$ You are correct to observe that each of the $2^{10}$ possible outcomes from ten tosses is equally likely. In testing a null hypothesis, it is best to agree--before taking data--about the criterion for rejection. — BruceET, Mar 20 '22 at 17:22
The link does answers my question partially, thanks, although I'm still not confident to give a definite answer to the question posed in the exercise. I think that one of my core problems is that the criterion is chosen after taking the data as Bruce mentioned. — bernoulliproblem, Mar 20 '22 at 18:00
I think that weird feeling is legitimate, because the solution you quote is arbitrary. Given any pair of finite sequences, one can construct ex post facto a relatively simple-looking criterion that makes either one of them seem more likely than the other according to that criterion. For instance, having a single 01 subsequence is three times more likely than having four such subsequences, allowing one to arrive at the opposite conclusion. — whuber, Mar 20 '22 at 18:25

BruceET · Accepted Answer · 2022-03-20T18:54:27.143

Suppose you decide ahead of time that you will reject the null hypothesis that a coin is fair, if it has a run of Heads or Tails that is $7$ or longer among ten tosses, then what is the significance level of that test?

Put another way, what is the probability that a fair coin will have a run of $7$ or more among ten tosses?

In R, the procedure rle (for Run Length Encoding) provides as way to approximate this probability by simulation.

Consider one experiment with ten tosses:

set.seed(2022)
x = rbinom(10, 1, .5)
x
 [1] 1 1 0 1 0 1 0 0 0 1
rle(x)
Run Length Encoding
  lengths: int [1:7] 2 1 1 1 1 3 1
  values : int [1:7] 1 0 1 0 1 0 1

We see that there are seven runs, the longest of which has length $3.$

Now we look at a $10\,000$ ten-toss experiments to get an idea of the distribution of the length $W$ of the longest run in ten independent tosses of a fair coin.

set.seed(320)
w = replicate(10^5, 
      max(rle(rbinom(10,1,.5))$len))
table(w)
w
    1     2     3     4     5     6     7     8     9    10 
  191 17251 36152 24492 12338  5590  2426   998   360   202 
mean(w >= 7)
[1] 0.03986

Thus it seems that the probability of getting a run of length seven or longer is about $0.04$ of 4%. So, according to the run-length criterion, we would reject $H_0$ at the 5% level of significance.

There are theoretical results about run lengths derived for use in such runs tests. You can google runs test for discussions of distributions of run lengths.

The histogram below shows the approximate distribution of $W.$ The area of the bars to the right of the vertical orange line is about $0.04.$

cutp = (0:10)+.5
hist(w, prob=T, br=cutp, col="skyblue2")
 abline(v = 6.5, col="orange")

Note on R code for simulation: The numeric vector w contains maximum run lengths in 10,000 ten-toss experiments. The logical vector w >= 7 has 10,000 TRUEs and FALSEs, and mean(w) gives the proportion of TRUEs.

The question as stated does not posit such a prior hypothesis -- and that's the crux of the matter. Testing hypotheses constructed by observing the data is HARKing. The example in the question is an extreme form of that error. — whuber, Mar 20 '22 at 18:26
Seems to me OP had already targeted that flaw. My answer shows that in a proper context of a runs test, a run of length seven is marginally good evidence that $p\ne 0.5.$ — BruceET, Mar 20 '22 at 18:37
A comment left by the OP before your post expressly states otherwise: "I think that one of my core problems is that the criterion is chosen after taking the data as Bruce mentioned." — whuber, Mar 20 '22 at 19:58
Reading it all again (tying for time order),I admit it difficult to discern whether OP is justifiably being critical of the claim in the text, whether OP is still hoping/imagining there may be some justification for it. and whether OP is too reluctant (or 'polite') to criticize the text as clearly wrong. — BruceET, Mar 20 '22 at 20:11
I was indeed not sure if there might be some justification, but the counterexample by @whuber makes it clear that this is indeed faulty reasoning if done after observing the data. — bernoulliproblem, Mar 20 '22 at 22:12

Detection of 'unfair' Bernoulli sequences using run lengths?

1 Answers1