5

In one of my experiments for measuring the performance of a specific type of software I measure the execution time and memory usage. Now I know that measuring these things can be extremely problematic since the operating system interferes with how the program is run and other programs/processes can be run during the execution of the benchmark.

I remember the following best practices:

  • Run multiple times
  • Take the smallest time it took to perform the benchmark, not the average.
  • Take the memory usage with a HUGE grain of salt and only talk about it when the memory used is large (if its in kilobytes or a few megabytes you can't say much about it) and when the difference is significant (at least a couple of tens of megabytes).

While it is fun that I can remember these best practices and that I can argue for the correctness of each of them I'd rather tie these best practices to some literature. However I have not yet found any that speaks specifically about speed/memory benchmarks. Does anybody know of papers or other material I can cite to substantiates these best practices?

(I'm not sure if this question belongs on cross validated. But since I asked a related question here I thought it would be best to start here. If this question fits better on academia, cs theory or Stack Overflow, please help me move it to the correct sub-site)

Roy T.
  • 213

1 Answers1

3

I can share two publications about the evaluation of algorithms:

  1. Coffin, M. & Saltzman, M. J. Statistical analysis of computational tests of algorithms and heuristics. INFORMS Journal on Computing, INFORMS, 2000, 12, 24-44
  2. Hoos, H. H. & Stützle, T. Evaluating las vegas algorithms: pitfalls and remedies. Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, 1998, 238-245

Of these two publications, the latter is a bit specific for SAT-solvers using stochastic local search. As in this case, modelling the runtime distribution as exponential fits very well.

ziggystar
  • 1,594