Sample size calculation for estimating 0 probability in finite sample binary data

Question

Nuclear storage containers were deposited some 20 years ago. Some may have a special plastisol coating left in place in areas which are to be welded. The plastisol was used for temporary storage/protection and was intended to be completely removed prior to welding. We don't have any confirmation that it was completely removed.

We have a population of 29 containers. What sample size will we need to determine (yes or no) was any platisol left in-place on any of the 29?

The sample size is quite small, which will make it difficult to get any real confidence, particularly since you are concerned with the probability that there is any coating left. The problem with dividing the problem into one of square inches is that you lose a basic assumption of independence (each square inch is very much related to its neighbours) -- you can still do something without the independence assumption but your bounds get much wider. If it's an option I would consider checking all of the containers rather than use a statistical approach. — David Kozak, Nov 27 '17 at 22:49
Thanks David for your advice. First, I think the reason we are looking at a statistical approach is because we are never going to convince the customer to look at all 29, we have better luck convincing them of looking at a smaller subset. I think we would be happy not to merely check the probability of any coating remaining but rather the probability of a small amount of coating remaining. In this way, we could use a statistically projected small coating surface area, we could prove no adverse aging-effects due to storage for another 20 years. If this were the case, best method to use? — CLB, Nov 27 '17 at 23:13
This looks to me to be a situation where you would really want to take a Bayesian approach. The prior will be fairly important. — Glen_b, Nov 28 '17 at 06:57

score 3 · Answer 1 · answered Dec 11 '17 at 17:07

This problem is not amenable to null hypothesis significance testing because tests cannot tell you the probability that all 29 containers do not have the coating. This is an estimation problem. The way to calculate sample size for estimates is based on a margin of error. The more you sample, the tighter the margin of error is. That's the way a statistician controls the confidence of their estimates.

95% CIs, however, are going to all be 0-0 using the normal approximation to the maximum likelihood estimator. A way around this is to use a median unbiased estimator. The median unbiased estimator is any value $\tilde{p}$ such that:

\begin{equation} Pr(Y \ge n\tilde{p}) \ge 0.5 \quad \text{and} \quad Pr(Y \le n\tilde{p}) \ge 0.5 \end{equation}

The small sample size correction for discrete data suggests using a "mid-point cumulative probability function" so that the $Pr(Y=n\tilde{p})$ is scaled by 0.5. That way a single value rather than a range of values satisfies the criterion. If you use a hypergeometric probability function, you may account for the finite population. The sample size calculation is too complex to achieve by simulation or analytically, so eyeballing is warranted. Using the code below, I generate MUEs with the following halfwidths:

As expected, rather undesirable properties of highly variable estimates are generally seen until you sample a very sizable fraction of all the repositories. It is a problem probably better suited to logistics than statistics.

hyperprob <- function(y, nd, nt, p, lower.tail=T, offset=0) {
  phyper(y, nt*p, nt*(1-p), nd)/2 + #dhyper does not deal with fractional gamma well
    phyper(y-lower.tail, nt*p, nt*(1-p), nd, lower.tail=lower.tail)/2 +
    offset
}

uniroot(hyperprob, c(0,1), y=y, nd=nd, nt=nt, lower.tail=F, offset=-0.5)
uniroot(hyperprob, c(0,1), y=y, nd=nd, nt=nt, lower.tail=F, offset=-0.025)
# estimate and lower are 0 if every sample is 0
upper <- uniroot(hyperprob, c(0,1), y=y, nd=nd, nt=nt, lower.tail=T, offset=-0.025)

Sample size calculation for estimating 0 probability in finite sample binary data

1 Answers1

Linked