Determining sample size for given confidence interval and margin of error

Question

I want to estimate the true runtime of a class of programs that run on a platform that introduces virtualization-related variance into the runtime. Quantitatively, my goal is to be able to state the true runtimes any of the programs in the class within a confidence interval of 95% and with a margin of error of 2% based on an average of the given program's sampled runtimes.

To be able to state this, I need to know how many times I need to measure the runtime of a given program in this class to get an average runtime that is within 2% of the true runtime with a 95% confidence rating. Note that the programs under test are completely deterministic, i.e. if there was no variance it would take the same amount of time each run.

My approach is to analyze a representative example of the class of programs I'm working with and to extrapolate from there. To that end, I've measured the runtime of a representative program 7500 times and plotted the results on a histogram -- the results have a roughly normal distribution (the x-axis is runtime in seconds):

I know the formula for the confidence interval for a normal distribution is:

X ± Z * s/√n

Where: X is the mean, Z is the chosen Z-value, s is the standard deviation, and n is the number of observations

My question is this -- can I use algebra to solve for n to determine the number of trials I would need to run to get an average runtime that is within 2% of the true runtime with a 95% confidence rating? Obviously, here the sampled average and sampled standard deviation would stand-in for the true values. My approach is the following:

1) Run a "sufficient" number of trials to get a sampled mean and standard deviation that is "close enough" to the true values.

2) Plug the results into the formula and solve for n, e.g. if we use the sampled mean of 7.27 and a sampled standard deviation of .11 obtained from the 7500 trials, we find that we would need to run three trials and take the average runtime to obtain the desired result:

Z * s/√n <= sampled mean * .02
1.96 * .11/√n <= 7.27 * .02
n >= 2.1987...
n >= 3 trials

Right general idea, but I think something's wrong with your computation: $n = [1.96(0.11)/.02]^2 \approx 117.$ — BruceET, Oct 01 '19 at 01:05
@BruceET I want a margin of error within two percent of the mean, so the calculation should be n=[1.96(0.11)/7.27(.02)]^2≈117, right? — Adam, Oct 01 '19 at 15:45
@BruceET I don't understand why ME = .02 holds here. Doesn't the margin of error have to be relative to the measured value? I keep looking at the equality and thinking "OK, 2% of what"? — Adam, Oct 01 '19 at 21:00
My end goal here is to say something like "we are 95% sure that the measured runtime falls within 2% of the true runtime". I want to know how many runs are required to be able to make this statement with 95% confidence. — Adam, Oct 01 '19 at 21:09

BruceET · Accepted Answer · 2019-10-02T16:25:47.347

1

If you want ME = .02, so that the width of the 95% CI is 0.04, then n = 117.

If you want ME = .02(7.27) = 0.1454, so that the width of the 95% CI is 0.2908, then the formula gives n = 3.

However, your distribution of runtimes has too heavy a right tail to be normal, so you can't expect averages of only 3 observations to be nearly normal. The 1.96 in the formula is based on normal averages. So "95% CIs" based on $n=3$ may not truly have 95% coverage.

edited Oct 02 '19 at 16:25

answered Oct 01 '19 at 22:08

BruceET

56,185

Can you elaborate a bit on what you mean by "the width of the 95% CI is..."? When you say the width is .2098 (for example), are you referring to Z score? It may be tricky to explain in a comment, if so, can you link me to some reading that explains the terminology? – Adam Oct 02 '19 at 16:30
1

CI is $\bar X \pm \text{ME},$ so its width is $2\text{ME},$ where $\text{ME} = 1.96\sigma/\sqrt{n}.$ – BruceET Oct 02 '19 at 16:34

Determining sample size for given confidence interval and margin of error

1 Answers1

Linked