Understanding the Coverage Probability of a Coin Toss

Question

Suppose there is a coin where the probability of success if 0.63. Let's say that I flip this coin 100 times and record the results (e.g. Mean, Standard Deviation and 95% Confidence Interval based on the Wald Interval).

Below, I wrote some R code to generate some data for these coin tosses:

# ACTUAL DATA (from a binomial 0.63)
set.seed(123)
real_data = rbinom(100, size = 1, prob = 0.63)
mean_prop = mean(real_data)
sd = sqrt( ((mean_prop)*(1-mean_prop))/100 )
lower = mean_prop  - 1.96*sd
upper = mean_prop  + 1.96*sd

My Question: Through simulation, I would like to learn how to correctly interpret the Coverage Probability of the 95% Wald Confidence Interval.

As I understand, the Coverage Probability shows that as the model parameter of interest (i.e. here its the probability of success) changes, what is the probability that the 95% Confidence Interval contains the true value of this parameter.

In other words:

Step 1: Suppose we fix a value of "p" (e.g. set p = 0.34)
Step 2: We then generate a random sample of 100 coin tosses from a Binomial Distribution with p = 0.34
Step 3: From this random sample of coin tosses, we calculate the Mean and Standard Deviation (note: the mean from this sample will not necessarily be equal to 0.34)
Step 4: We then see if a 95% Confidence Interval calculated from this random sample contains the true value of the parameter (i.e. is 0.34 within the 95% Confidence Interval)
Step 5: We then repeat Step 2 - Step 4 many times (e.g. 1000 times), and calculate the average number of times the 95% Confidence Interval contained the true value (we then plot this average measurement on the graph)
Step 6: Now, we fix a new value of "p" (e.g. set p = 0.35) and repeat Step 2 - Step 5.
Step 7: And finally, we repeat Step 1 - 5 for many values of "p" (e.g. in the Binomial Distribution, from p = 0 to p = 1)

Below, I tried to write the R code to represent the above process:

# Set the values of p to loop through
prop = seq(0.0, 1, by = 0.005)
Initialize empty list to store results
my_list_2 = list()
Start loop
for (j in 1:length(prop))  {
  p_j = prop[j]
Initialize empty list to store results for each iteration of the inner loop
my_list = list()
for (i in 1:1000) {
    sample_i = rbinom(100, size = 1, prob = p_j)
    mean_i = mean(sample_i)
    sd_i = sqrt( ((mean_i)*(1-mean_i))/100 )
lower_i = mean_i - 1.96*sd_i
upper_i = mean_i + 1.96*sd_i

counter_i = ifelse(p_j &gt;= lower_i &amp;&amp; p_j &lt;= upper_i, 1,0)

a_i = data.frame(id = i, lower = lower_i, upper = upper_i, mean = mean_i ,  counter = counter_i)

my_list[[i]] = a_i

}
Bind data frames in my_list into a single data frame
mean_i = do.call(rbind.data.frame, my_list)
Calculate mean of counter column
mean_counter = mean(mean_i$counter)
Create data frame b_j
b_j = data.frame(id = j, p = p_j , mean = mean_counter)
  my_list_2[[j]] = b_j
print(b_j)
}
Bind data frames in my_list_2 into a single data frame
result = do.call(rbind.data.frame, my_list_2)
plot(result$p, result$mean, type = "b", main = "95% Coverage Probability Plot Using Wald Intervals", xlab = "Proportion Value", ylab = "Average Coverage Probability (95% CI Wald Interval)")
abline(h = mean(result$mean), col = "green", lty = 3)
abline(h = 0.95, col = "red", lty = 2)

Based on the above graph, I would make the following conclusions:

The "red line" shows what 95% Coverage Probability looks like in theory - but the "green line" shows you the average observed Coverage Probability from the simulation (0.927)
Here, we can see that the average observed Coverage Probability is less than the 95% expected Coverage Probability
What this means is that (contrary to popular belief) the 95% Confidence Interval does not necessarily contain the true value of the parameter 95% of the time
This is particularly true for situations when the true value of "p" is very large or very small (e.g. close to 0 or close to 1) - in such cases, the Coverage Probability can be even less than 0.5! If we are studying some natural phenomenon in which we believe that the true value of "p" can be very large or very small (e.g. proportion of population having some rare disease), we should be careful when trusting the results of a 95% Confidence Interval
Finally, it would be interesting to compare the Coverage Probability based on the Wald Confidence Interval to other alternate types of Confidence Intervals (e.g. Clopper-Pearson, Wilson, etc.). It is possible that some of these options might be better suited for very large or very small values of "p" - or might in general display better performance (at the tradeoff of computational complexity for instance)

Is my understanding of the above correct?

Thanks!

Does this help https://stats.stackexchange.com/a/82724/35989 ? You basically discovered that normal approximation can work poorly for some cases of calculating confidence intervals for binomial proportion. — Tim, Feb 05 '23 at 22:06
In short, a binomial proportion is only approximately normal for sufficiently large $n$. If you take a very small or very large $p$, then $n$ also needs to be quite large for the approximation to be reasonable. With discrete variables, the attainable coverage probabilities are also discrete, so even with exact calculations rather than approximations you won't get exactly the desired coverage. — Glen_b, Feb 05 '23 at 22:31
@ Tim: thank you for your reply! I will consult this reference! Can you please comment if my code and interpretations are correct? Thank you! — stats_noob, Feb 05 '23 at 22:37
Check the thread. It's an approximation. It is known not to hold exactly, as every approximation does. This is what you observed. As @Glen_b, this especially is the case for extreme $p$'s or small $n$'s. — Tim, Feb 05 '23 at 23:04
@Tim I'm in two minds about this being sufficiently close to a duplicate of the question you linked. — Glen_b, Feb 05 '23 at 23:09

Understanding the Coverage Probability of a Coin Toss

Initialize empty list to store results

Start loop

Initialize empty list to store results for each iteration of the inner loop

Bind data frames in my_list into a single data frame

Calculate mean of counter column

Create data frame b_j

Bind data frames in my_list_2 into a single data frame

0 Answers0