How to conduct inference on binomial proportions with small sample sizes?

Question

Suppose we have a one-sample binomial proportions problem with a small number of trials (say 5–10 trials). I want to conduct inference and come up with a p-value for testing $p\neq p_0$ for some $p_0$ prescribed proportion. What are some ways to do this? Is there a Fisher's Exact test analogue for one-sample? What about re-randomization tests?

Because tests and confidence intervals are very nearly equivalent, please search for posts on binomial confidence intervals. — whuber, Apr 03 '23 at 01:17
Before you go far down this path, examine your power curve at various p-p0 values. Consider whether this will be useful for your purposes (it might be, but often people are surprised by how large an n they need for good power) — Glen_b, Apr 03 '23 at 05:52
@Glen_b Thanks, could you suggest resources for me to construct such a power curve? — user321627, Apr 03 '23 at 06:57
It's just a matter of identifying the rejection rule and then calculating the rejection rate for any set of values of p that are of interest. For any given p it's essentially a binomial tail probability (which can be evaluated fairly easily) but simulation is also straightforward. — Glen_b, Apr 03 '23 at 13:23
@Glen_b Yes, but OP can't identify a rejection region or evaluate a power curve until they know how to conduct a statistical test in the first place. Which is OP's question. Note also that identifying the rejection region for a two-sided binomial test is a non-elementary problem that has several different solutions in use in the literature. — Gordon Smyth, Apr 03 '23 at 21:50
@user321627 It's noticeable that you haven't accepted either of the answers. What has not been addressed? IMO there is a clear-cut answer to your question because the binomial test with smallp rejection region is the clear analogue of Fisher's exact test for contingency tables. — Gordon Smyth, Apr 10 '23 at 22:53

Gordon Smyth · Accepted Answer · 2023-04-04T04:05:49.453

Just use the binomial distribution directly. Evaluating one-sided p-values is just a matter of binomial tail probabilities. If you observe $y$ successes out of $n$ trials and you want to test $H_0$: $p=p_0$ vs $H_a$: $p>p_0$ then the one-sided p-value is $P(Y \ge y)$ where $Y$ follows the Bin($n$, $p_0$) binomial distribution. The one-sided p-value of $H_0$ vs $H_a$: $p<p_0$ would be $P(Y \le y)$.

For example, suppose $n=8$, you observe $y=7$ and $p_0=0.4$. Then the one-sided upper-tail p-value is

> y <- 7
> n <- 8
> p0 <- 0.4
> pbinom(y-0.5, size=n, prob=p0, lower.tail=FALSE)
[1] 0.00851968

which is 0.0085. This is an exact binomial-test p-value.

Two-sided tests of $H_0$: $p=p_0$ vs $H_a$: $p\ne p_0$, are a little more complicated because there are several competing ways to construct the rejection region when the null distribution is asymetric. If $p_0=0.5$, then the null distribution is symmetric and you can simply multiply the smaller of the two one-sided p-values by 2. If the null distribution is not symmetric then you can still multiply the smallest one-sided p-value by 2 to get a valid two-sided p-value, an approach that is simple and easy but not the most popular as it is somewhat conservative. The method that is most closely analogous to Fisher's exact test is to take the sum of all the binomial probabilities that are less than or equal to $P(Y=y)$ given that $p=p_0$.

My function exactTest implements four different ways to compute the two-sided p-value (see https://rdrr.io/bioc/edgeR/man/exactTest.html). The method of doubling the smallest one-sided p-value is called "doubletail" and the method of adding up the smallest probabilities is called "smallp". Each method corresponds to a different rejection region. If $p_0=0.5$, so the null distribution is symmetric, then all four methods are the same. My function assumes negative binomial counts but reduces to binomial tests when dispersion=0.

The binomial test is also implemented in the binom.test function in R. It uses the "smallp" method for two-sided probabilities, which is the same method used by Fisher's exact test to construct a two-sided rejection region.

There have been many related answers on this forum:

Exact Binomial Test p-values? (explains the "smallp" method)
How to properly calculate the exact p-value for binomial hypothesis testing (e.g. in Excel)?
p-values different for binomial test vs. proportions test. Which to report?
How do you calculate an exact two-tailed P-value using binomial distribution? (AdamO asserts the "doubletail" method. whuber asserts there are at least five different reasonable ways to define the two-sided rejection region.)
Why are p-values from the binomial test in R non-monotonic in trials?
Understanding 2-sided p-values in a binomial distribution

Finally, note that the binomial test is exactly equivalent to a re-randomization test. If you conduct a one-sided randomization test infinitely many times, you'll just get the one-sided binomial p-values.

And yet, for small sample sizes, exact distributions may under-perform. See, for example, Agresti, A., & Coull, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician, 52(2), 119–126. — Alexis, Apr 03 '23 at 23:26
@Alexis No I don't agree. Approximate p-values are not better than exact. You are jumping to incorrect conclusions. Agresti's paper deals with confidence intervals rather than p-values. The reaon why approximate CIs can do better than exact is because CIs assume a preset alpha level and, with discrete data, exact tests can't match the alpha level exactly. P-values on the other hand do not depend on preset alpha levels and exact p-values are always optimal given the observed data. Agresti does not recommend converting Wilson intervals to p-values by function inversion, and neither do I. — Gordon Smyth, Apr 04 '23 at 08:10
TY for educating me, Gordon Smyth! (BTW, my comment was accompanied by a +1 for your answer.) — Alexis, Apr 04 '23 at 15:36
Can you help me understand what you mean by "preset alpha levels"? As I understand it, the decision to reject or not reject a null hypothesis is predicated on an a priori choice of Type I error rate under the assumption that the null hypothesis is true. How is this different than an a priori level of confidence (i.e. $1 - \alpha$)? Or by "preset alpha level" do you mean the p-value itself is calculated independently of the choice of $\alpha$? — Alexis, Apr 04 '23 at 22:01
@Alexis You understand what I mean by "preset alpha level". It is just math-speak for 1 minus confidence level (for a CI) or the p-value cutoff used in a hypothesis-testing-decision-making (HTDM) framework. However p-values have a much more general interpretation as a summary of the strength of evidence against the null without requiring a decision-making setup or preset p-value cutoff. Fisher himself (who invented p-values) was very much opposed to the HTDM framework proposed by Neyman and Pearson. — Gordon Smyth, Apr 05 '23 at 06:51
@Alexis The idea that decisions will be made on the basis of one p-value alone in isolation to other evidence is a bit of a fiction propagated by introductory statistics courses (of which I have taught many). Here are a couple of talks that I have given to professional audiences about p-values that discuss the issues: https://gksmyth.github.io/talks/210629-WEHI-PValues.pdf and https://gksmyth.github.io/talks/140328-FOAM14-PValues.pdf. — Gordon Smyth, Apr 05 '23 at 07:02
Thank you for taking time with me! I better understand where you are coming from. :) — Alexis, Apr 05 '23 at 17:53

score 2 · Answer 2 · answered Apr 03 '23 at 04:48

2

Rather than doing a hypothesis test, it may be more informative to derive a confidence interval for the probability parameter using the Wilson score interval. You can do this using the CONF.prop function in the stat.extend package. Here is an example where we generate a 95% confidence interval a small amount of data:

stat.extend::CONF.prop(alpha = 0.05, n = 10, sample.prop = 4/10)
    Confidence Interval (CI) 


95.00% CI for proportion parameter for infinite population 
Interval uses 10 binary data points with sample proportion = 0.4000
[0.168180329706236, 0.687326230266342]

answered Apr 03 '23 at 04:48

Ben

124,856

If my original goal was to get a p-value, how can I get it from the Wilson score interval? And what advantage does doing the Wilson score interval have over an exact test? – user321627 Apr 03 '23 at 05:46
+1 (See the citation I dropped in a comment to Gordon Smyth's answer, which also talks about Wilson's intervals.) @user321627 You can convert CIs to p-values through function inversion: notice that both test statistics and CIs rely on functions of $p - p_0$ and the SE of the same difference. – Alexis Apr 03 '23 at 23:29
Also see response by @GordonSmyth to my comment. :) – Alexis Apr 04 '23 at 15:37

How to conduct inference on binomial proportions with small sample sizes?

2 Answers2