Sample size needed to estimate probability of "success" in Bernoulli trial

Question

Suppose a game offers an event which upon completion, either gives a reward, or gives nothing. The exact mechanism for determining whether the reward is given is unknown, but I assume a random number generator is used, and if the result is greater than some hard-coded value, you get the reward.

If I want to basically reverse-engineer what value the programmers used to determine how often the reward is given (estimated 15-30%), how do I calculate the number of samples I need?

I started with the "Estimator of true probability" section here: Checking_whether_a_coin_is_fair, but I'm not certain I'm heading down the right path. I was getting results of ~1000 samples needed for a maximum error of 3% at 95% confidence.

Ultimately, here's what I'm trying to solve:

Event #1 gives reward 1.0R, X% of the time
Event #2 gives reward 1.4R, Y% of the time

I'd like to estimate X & Y accurately enough to determine which event is more efficient. Large sample sizes are a problem since I can only get 1 sample every 20 minutes, at most.

"I was getting results of ~1000 samples needed for a maximum error of 3% at 95% confidence." --- pretty much; that's why polls often sample about 1000 people ... and then report a margin of error in the order of 3%. It applies pretty well when the percentage isn't close to 0 or 1 (it's too wide in those cases) — Glen_b, Jan 06 '15 at 23:47
What do you mean by "which even is more efficient"? Do you mean "which event has the larger expected reward"? — Glen_b, Jan 06 '15 at 23:50
Yeah, I'm trying to figure out which has the larger expected reward over time. I can't do both events -- have to choose one or the other. While Event #1 gives less reward, it is possible it gives the reward more often. — Brad, Jan 07 '15 at 16:05
You could use sequential samplig, in which sample size is not fixed in advance. The advantage to this approach is that it guarantees a confidence no matter what the (unknown) probability be. See for example here; specially the last referenced paper — Luis Mendo, Jan 09 '15 at 16:06

score 6 · Answer 1 · answered Apr 14 '17 at 11:15

Assuming your individual trials are independent, you observe a binomial variate $$ X \sim \text{Bin}(n,p) $$ where you decide on $n$ and want to estimate $p$. Now the maximum likelihood estimator of $p$, the sample fraction $\hat{p}=X/n$ have variance $\frac{p\cdot(1-p)}{n}\le \frac1{4n}$ which is achieved for $p=\frac12$. So the standard error is $\le 1/\sqrt{4 n} = \frac1{2\sqrt{n}}$. A large-sample aprroximate confidence interval has half-width about 2 standard errors, so to keep that at maximum $0.03$, say, you have to solve $$ \frac2{2\sqrt{n}} \le 0.03 $$ which gives $n \ge 1112$. Now you can solve for other requirements for halfwidth, the same way. If you know (or is willing to assume) that $p$ is bounded away from 0.5, you can do with somewhat less observations.

score 2 · Answer 2 · edited Oct 19 '21 at 19:04

I know it is less elegant, but I had to simulate it. Not only did I build a pretty simple simulation, but it is inelegant and slow to run. It is good enough, though. One advantage is that, as long as some of the basics are right, it is going to tell me when the elegant approach falls down.

The sample size is going to vary as a function of the hard-coded value.

So here is the code:

    #main code
    #want 95% CI to be no more than 3% from 
    # prevalence
    #expect prevalence around 15% to 30%
    #think sample size is ~1000
my_prev &lt;- seq(from=0.15, to=0.30, 
               by = 0.002)

samp_sizes &lt;- seq(from=400, to=800, by = 1)
samp_sizes

N_loops &lt;- 2000

store &lt;- matrix(0,  nrow = 
    length(my_prev)*length(samp_sizes),
                ncol = 3)
count &lt;- 1

#for each prevalence
for (i in 1:length(my_prev)) {

     #for each sample size
     for(j in 1:length(samp_sizes)){

          temp &lt;- 0

          for(k in 1:N_loops){

               #draw samples
               y &lt;- rbinom(n = 
                     samp_sizes[j],
                           size = 1,
                           prob = 
                            my_prev[i])

               #compute prevalence, store
               temp[k] &lt;- mean(y)

          }

          #compute 5% and 95% of temp
          width &lt;-  diff(quantile(x = temp, 
             probs = c(0.05,0.95)))

          #store samp_size, prevalence, and 
          # CI half-width
          store[count, 1] &lt;- my_prev[i]
          store[count, 2] &lt;- samp_sizes[j]
          store[count, 3] &lt;- width[[1]]

          count &lt;- count+1
     }

}


store2 &lt;- numeric(length(my_prev))

#go through store
for(i in 1:length(my_prev)){
     #for each prevalence
     #find first CI half-width below 3%
     #store samp_size

     idx_p &lt;- which(store[, 1] == 
               my_prev[i], arr.ind = T)
     idx_p

     temp &lt;- store[idx_p, ]
     temp

     idx_2 &lt;- which(temp[, 3] &lt;= 0.03*2, 
       arr.ind = T)
     idx_2

     temp2 &lt;- temp[idx_2, ]
     temp2

     if (length(temp2[,3])&gt;1){
     idx_3 &lt;- which(temp2[, 3]==max(temp2[, 
              3]), arr.ind = T)
     store2[i] &lt;- temp2[idx_3[1], 2]
     } else {
          store2[i] &lt;- temp2[2]
     }


}


#plot it
plot(x=my_prev, y=store2, 
     xlab = &quot;prevalence&quot;, 
     ylab = &quot;sample size&quot;)
lines(smooth.spline(x=my_prev,y=store2), 
        col=&quot;Red&quot;)
grid()


And here is the plot of sample size vs. prevalence such that uncertainty in 95% CI for prevalence is as close as possible to $\pm$3% without going over it.

Away from 50%, "somewhat less observations" seem to be required, as kjetil suggested.
I think that you can get a decent estimate of prevalence before 400 samples, and adjust your sampling strategy as you go.  I don't think there should be a jog in the middle, and so you might bump N_loops up to 10e3, and bump the "by" in "my_prev" down to 0.001.

If this is slow, it is because you make the steps way too small! — kjetil b halvorsen, Nov 07 '18 at 09:35
@kjetilbhalvorsen - it was good enough. The "slow" was a disclaimer, and a "handrail" that can help the asker feel more comfortable engaging the method. If you don't know the analytic approach, a simulation can help you teach yourself, or help you decide if you need to ask for help. — EngrStudent, Nov 07 '18 at 14:04

score 1 · Answer 3 · answered Nov 17 '16 at 12:01

It seems like you want to estimate for Event #1 the value of $X$ and for Event #2 the value of $Y$. You can easily use Hoeffding's inequality to determine bounds here, or if you want additive, rather than multiplicative bounds, you can use Chernoff's bound.

Sample size needed to estimate probability of "success" in Bernoulli trial

3 Answers3

Linked