Number of times to run a lengthy experiment

Question

I have this long running experiment. Each time I run it I get a new goodness value, since the algorithm has random variables in it. So I need to report the mean and the std of some n runs. What should n be?

I need to be able to defend n based on some statistical ideas. Some kind of scientific reference (a book, a paper) would be wonderful, too.

I provide more details as you say, thanks for the answers:

In computer vision, an important challenge is to recognize objects from images. Different computer algorithms are developed for this purpose. To see how good a new algorithm is, one sometimes constructs a test and a training set of images, say 1000 images for each, train the algorithm with the training images, and produce a success rate using the test set. If out of the 1000 objects in the test images, 800 is recognized by the algorithm, the success rate is said to be 80 percent.

Now, my algorithm analyses, say, 1000 RANDOM points in the image, and using that analysis, tries to recognize the objects in the image. Each time I run the algorithm, I get a different success rate, since 1000 points are produced RANDOMLY. So I think its best to report some kind of summary statistics (e.g. the main and std deviation) of the success rate.

Also, one sometimes needs to say, "well in addition to my algorithm, I tried these, say, 10 algorithms on the same dataset, and this table shows that mine is the best in this and this way..." Some of these algorithms may need to run more than once, too. So one can really have a long experiment.

So, as I said before, at least how many times should I run the long running experiment?

Thx.

To get a useful answer, you'll need to provide more information. What does your experiment consist of? — cardinal, May 16 '11 at 15:10
From the question, I am guessing that the interest is in the distribution of the "goodness" value, that the runs are independent simulations and that we need to repeat the runs $n$ times where $n$ is large enough to estimate the mean and standard variation sufficiently well. Is that the question? — NRH, May 17 '11 at 00:04
@cardinal: I gave the info in the question. @NRH: Even though I am not good enough with the language of statistics, I believe your guess is right to the point. — user4629, May 17 '11 at 12:34

Thies Heidecke · Answer 1 · 2011-05-18T03:09:31.003

If we assume an underlying success probability $\theta$ for a specific training set and number of points used for image analysis, we can expect a binomial distribution for the number of successful object recognitions $k$ out of $n$ runs:

$P(k|\theta,n)=\left(n\atop k\right)\theta^k (1-\theta)^{n-k}$

What we're actually interested in, is the value and uncertainty of $\theta$ for given $n,k$. We get a probability distribution for $\theta$ by applying Bayes Theorem:

$p(\theta|k,n) \propto p(k|\theta,n)*p(\theta|n)$

The first term on the right side is the likelihood, for which we can use the formula from above. The second term is the prior distribution for $\theta$. If we want to be neutral about $\theta$ before any experiment is done we can choose the Jeffreys prior

$p(\theta|n) \propto \theta^{-1/2}(1-\theta)^{-1/2}.$

Putting our likelihood and prior distributions together and normalizing the resulting distribution we end up with

$p(\theta|k,n) = \frac{\Gamma(n+1)}{\Gamma(k+\frac{1}{2})\Gamma(n-k+\frac{1}{2})} \theta^{k-1/2}(1-\theta)^{n-k-1/2},$

which is the Beta distribution with parameters $k+\frac{1}{2}$ and $n-k+\frac{1}{2}$.

If we e.g. have $n=5$ runs and $k\in\{1,3\}$ successes, the distribution for probability of different underlying $\theta$ values looks like this:

true theta pdf

From this probability distribution we can get all the information we need about $\theta$ in order to decide to do further runs or not, e.g. we can compute credible intervals for the true value of $\theta$. By computing expected values for $\theta$ and $\theta^2$ we can get formulas for the mean $\theta$ and the variance of $\theta$:

$E\left[\theta\right]=\int_{\theta=0}^1 \theta \; p(\theta|k,n) = \frac{2k+1}{2n+2}$

$E\left[\theta^2\right]=\int_{\theta=0}^1 \theta^2 p(\theta|k,n) = \frac{(2k+1)(2k+3)}{4(n+1)(n+2)}$

$V\left[\theta\right]=E\left[\theta^2\right] - E\left[\theta\right]^2=\frac{4k(n-k)+2n+1}{4(n+1)^2(n+2)}=\frac{(k+\frac{1}{2})(n-k+\frac{1}{2})}{(n+1)^2(n+2)}$

Now we could run experiments and compute the mean and variance of the expected $\theta$ and stop when the variance is smaller than a predefined value that reflects our desired certainty for $\theta$. For example, let's say we want the standard deviation of our $\theta$-distribution be smaller than $\sigma$, then we should do runs until the condition

$\frac{(k+\frac{1}{2})(n-k+\frac{1}{2})}{(n+1)^2(n+2)} < \sigma^2$

is satisfied, with a resulting estimate for ${\tilde \theta}=\frac{2k+1}{2n+2}$.

If e.g. $\sigma=0.05$ we have to do runs until we end up in the orange region in this plot:

We can see that we need a lot more runs to be as certain about $\theta$ in case of $\theta \approx 0.5$ (about a hundred in this example) than if we have a $\theta$ near zero or one (where perhaps 30 runs could be sufficient).

A nice introductory textbook which contains examples like this is Data Analysis: A Bayesian Tutorial. It also contains a small chapter on experiment design.

Nice answer (+1) - just one small point, the prior you describe is actually the improper haldene prior, not the jeffreys prior, jeffreys prior is proportional to the square root of your prior $p(\theta)\propto \sqrt{\theta^{-1}(1-\theta)^{-1}}$. And in order for your prior to work in all cases, you need to have at least one success and one failure observed (these remove the infinity at $0$ and $1$ in the prior). Possibly also worth mentioning that posterior is a Beta distribution with parameters $k$ and $n-k$ — probabilityislogic, May 18 '11 at 01:22
Oh, thanks for the correction! I didn't know about the haldane prior. I'll update my calculations according to the real jeffreys prior then ;) — Thies Heidecke, May 18 '11 at 01:49
I will read (study?) the rest, but I have a problem with the construction of the first equation: $P(k|\theta,n)=\left(n\atop k\right)\theta^k (1-\theta)^{n-k}$ implies that you think each one of the $n$ runs results in a success (with probability $\theta$) or a failure, am I right? But in reality each run of this experiment/simulation results in a success rate between 0 and 1. Because, in each run the algorithm is faced with, say, 1000 objects of the test set,and if it successfully recognizes, say, 800 of them, it gets a success-rate of 0.80. Also, probably, $k$ will be much larger than $n$. — user4629, May 18 '11 at 11:38
Ok, i misread your question then. So you have $n$ runs, and for each run you have $M$ test cases and a randomly picked subset of pixels you use for classification, which results in $k_i\in{0,...,M}$ successful classifications for the $i$th run (or a success rate of $\theta_i=k_i/M$), right? Then you basically want to estimate the mean and variance of the unknown probability distribution of $\theta$, and choose $n$ so that the uncertainty in both gets sufficiently small. I'm not sure, but perhaps a frequentist approach would be easier here. — Thies Heidecke, May 19 '11 at 08:12

Number of times to run a lengthy experiment

1 Answers1