Suppose there is a coin where the probability of success if 0.63. Let's say that I flip this coin 100 times and record the results (e.g. Mean, Standard Deviation and 95% Confidence Interval based on the Wald Interval).
Below, I wrote some R code to generate some data for these coin tosses:
# ACTUAL DATA (from a binomial 0.63)
set.seed(123)
real_data = rbinom(100, size = 1, prob = 0.63)
mean_prop = mean(real_data)
sd = sqrt( ((mean_prop)*(1-mean_prop))/100 )
lower = mean_prop - 1.96*sd
upper = mean_prop + 1.96*sd
My Question: Through simulation, I would like to learn how to correctly interpret the Coverage Probability of the 95% Wald Confidence Interval.
As I understand, the Coverage Probability shows that as the model parameter of interest (i.e. here its the probability of success) changes, what is the probability that the 95% Confidence Interval contains the true value of this parameter.
In other words:
- Step 1: Suppose we fix a value of "p" (e.g. set p = 0.34)
- Step 2: We then generate a random sample of 100 coin tosses from a Binomial Distribution with p = 0.34
- Step 3: From this random sample of coin tosses, we calculate the Mean and Standard Deviation (note: the mean from this sample will not necessarily be equal to 0.34)
- Step 4: We then see if a 95% Confidence Interval calculated from this random sample contains the true value of the parameter (i.e. is 0.34 within the 95% Confidence Interval)
- Step 5: We then repeat Step 2 - Step 4 many times (e.g. 1000 times), and calculate the average number of times the 95% Confidence Interval contained the true value (we then plot this average measurement on the graph)
- Step 6: Now, we fix a new value of "p" (e.g. set p = 0.35) and repeat Step 2 - Step 5.
- Step 7: And finally, we repeat Step 1 - 5 for many values of "p" (e.g. in the Binomial Distribution, from p = 0 to p = 1)
Below, I tried to write the R code to represent the above process:
# Set the values of p to loop through
prop = seq(0.0, 1, by = 0.005)
Initialize empty list to store results
my_list_2 = list()
Start loop
for (j in 1:length(prop)) {
p_j = prop[j]
Initialize empty list to store results for each iteration of the inner loop
my_list = list()
for (i in 1:1000) {
sample_i = rbinom(100, size = 1, prob = p_j)
mean_i = mean(sample_i)
sd_i = sqrt( ((mean_i)*(1-mean_i))/100 )
lower_i = mean_i - 1.96*sd_i
upper_i = mean_i + 1.96*sd_i
counter_i = ifelse(p_j >= lower_i && p_j <= upper_i, 1,0)
a_i = data.frame(id = i, lower = lower_i, upper = upper_i, mean = mean_i , counter = counter_i)
my_list[[i]] = a_i
}
Bind data frames in my_list into a single data frame
mean_i = do.call(rbind.data.frame, my_list)
Calculate mean of counter column
mean_counter = mean(mean_i$counter)
Create data frame b_j
b_j = data.frame(id = j, p = p_j , mean = mean_counter)
my_list_2[[j]] = b_j
print(b_j)
}
Bind data frames in my_list_2 into a single data frame
result = do.call(rbind.data.frame, my_list_2)
plot(result$p, result$mean, type = "b", main = "95% Coverage Probability Plot Using Wald Intervals", xlab = "Proportion Value", ylab = "Average Coverage Probability (95% CI Wald Interval)")
abline(h = mean(result$mean), col = "green", lty = 3)
abline(h = 0.95, col = "red", lty = 2)
Based on the above graph, I would make the following conclusions:
- The "red line" shows what 95% Coverage Probability looks like in theory - but the "green line" shows you the average observed Coverage Probability from the simulation (0.927)
- Here, we can see that the average observed Coverage Probability is less than the 95% expected Coverage Probability
- What this means is that (contrary to popular belief) the 95% Confidence Interval does not necessarily contain the true value of the parameter 95% of the time
- This is particularly true for situations when the true value of "p" is very large or very small (e.g. close to 0 or close to 1) - in such cases, the Coverage Probability can be even less than 0.5! If we are studying some natural phenomenon in which we believe that the true value of "p" can be very large or very small (e.g. proportion of population having some rare disease), we should be careful when trusting the results of a 95% Confidence Interval
- Finally, it would be interesting to compare the Coverage Probability based on the Wald Confidence Interval to other alternate types of Confidence Intervals (e.g. Clopper-Pearson, Wilson, etc.). It is possible that some of these options might be better suited for very large or very small values of "p" - or might in general display better performance (at the tradeoff of computational complexity for instance)
Is my understanding of the above correct?
Thanks!
