1

I'm studying a Bernoulli random variable $X$ with success probability $p$ which is unknown but satisfies $|p - a| < \epsilon$ for some constants $a$ and $\epsilon$. Given some confidence level $C$, I'd like to know how many samples $n$ that I need to be confident that the estimate $\hat{p} = \bar{X}/n$ lies within the interval $\left[a - \epsilon, a + \epsilon\right]$. In particular, I'd like to find $n$ which is independent of $\hat{p}$.

Hoeffding's inequality seems relevant, but that looks at an interval centered around the population mean of the random variable. In this case, the random variable's mean population could be anywhere within the interval. I also found a related question, but here I have slightly more information (knowing the probability lies within an interval).

Edit: I originally said "Given $p$", but what I just meant given $p$ satisfying the inequality. I don't know have access to $p$.

Germ
  • 217

1 Answers1

0

To deal with the asymmetry, you can use one-sided Hoeffding's inequality (see the top of the Wikipedia page) instead of the two-sided one.

Let $S_n$ denote $X_1+\cdots+X_n$.

$$P(S_n/n \notin [a-\epsilon, a+\epsilon]) = P(S_n/n < a-\epsilon) + P(S_n/n > a+\epsilon) \le e^{-2n(a-p-\epsilon)^2} + e^{-2n(a-p+\epsilon)^2}.$$

When $a=p$, you recover the usual two-sided bound $2e^{-2n\epsilon^2}$; for this to be smaller than $0.05$, you need $n \ge \frac{\ln 40}{2 \epsilon^2}$.

More generally when $a \ne p$ we can crudely bound the asymmetric bound by $$e^{-2n(a-p-\epsilon)^2} + e^{-2n(a-p+\epsilon)^2} \le e^{-2n(\epsilon - |a-p|)^2},$$ which is smaller than $0.05$ when $n \ge \frac{\ln 40}{2(\epsilon - |a-p|)^2}$. This bound is a bit too conservative when $|a-p|$ is close to $\epsilon$.

angryavian
  • 2,328
  • Thanks for the asymmetric bound, but what I don't get here is that these expressions depend on $p$. However, $p$ is unknown but lies in a known interval. How then can I use these results? – Germ Feb 21 '24 at 13:39