I have this long running experiment. Each time I run it I get a new goodness value, since the algorithm has random variables in it. So I need to report the mean and the std of some n runs. What should n be?
I need to be able to defend n based on some statistical ideas. Some kind of scientific reference (a book, a paper) would be wonderful, too.
I provide more details as you say, thanks for the answers:
In computer vision, an important challenge is to recognize objects from images. Different computer algorithms are developed for this purpose. To see how good a new algorithm is, one sometimes constructs a test and a training set of images, say 1000 images for each, train the algorithm with the training images, and produce a success rate using the test set. If out of the 1000 objects in the test images, 800 is recognized by the algorithm, the success rate is said to be 80 percent.
Now, my algorithm analyses, say, 1000 RANDOM points in the image, and using that analysis, tries to recognize the objects in the image. Each time I run the algorithm, I get a different success rate, since 1000 points are produced RANDOMLY. So I think its best to report some kind of summary statistics (e.g. the main and std deviation) of the success rate.
Also, one sometimes needs to say, "well in addition to my algorithm, I tried these, say, 10 algorithms on the same dataset, and this table shows that mine is the best in this and this way..." Some of these algorithms may need to run more than once, too. So one can really have a long experiment.
So, as I said before, at least how many times should I run the long running experiment?
Thx.

