How to to easily reproduce published results in my own articles using my own code

Question

I wrote a program/library which I used to obtain results in an article. (Here it is, but my question is general.) I have tests that I run regularly using ctest (it takes a few minutes to run). In order to reproduce some tables or figures in the article, I have to construct a script or a simple driver program, that runs maybe 10 minutes, sometimes more, so I don't want to have this part of the regular test suite. At the same time, I want to make sure that the results from the article can be:

reproduced later
make sure they still give the same/correct results after I keep developing the library

Currently I try to have a small driver program that I run as part of the regular test suite, and if I want to reproduce results from the article, I uncomment some lines in there. Of course, I never know which exact lines and if I have to tweak some other parameters in order to get precisely the same results as in the article.

I also tried to have a Python script that calculates the exact figures/tables from the article. Such a script typically stops working after an update to the library, because it is not being run on a regular basis (takes too much time).

The best method that occurred to me is to have a Fortran (or C/C++) example, that will be regularly compiled (as part of the library), but not run in regular test suite. That way, at least I know that it compiles fine (and thus hopefully also runs). And I'll test some simple (smaller) example as part of a regular test suite.

What are optimal ways to handle this problem?

Great question, my first reaction is that you should divide your tests into quick regressions that can be run swiftly and are performed before every commit and longer regressions that you would want to use as part of a continuous integration effort. Are you specifically in the situation where you only have tests of the former variety and have not divided them yet? — Aron Ahmadia, Dec 23 '12 at 21:38
I have lot of tests that run quickly, see here: https://github.com/certik/hfsolver/tree/master/src/tests, but I don't know how to handle the actual calculations for the article (e.g. 10 min for each table/figure gives easily couple hours total). — Ondřej Čertík, Dec 23 '12 at 21:41
Run the long tests automatically nightly (or weekly, monthly, etc.) using a continuous integration server. Since you don't have to pay attention to anything but the results, you won't care how long they take. — David Ketcheson, Dec 28 '12 at 20:05

Wolfgang Bangerth · Accepted Answer · 2014-11-13T00:16:38.143

6

In deal.II, we have a testsuite that is driven by a regular Unix Makefile. It has a default target that runs all the usual tests, and a separate target for expensive tests. Running each test is done using a generic rule but the default target calls the generic rule only for certain tests and the expensive target calls it for the expensive tests. Because it's all done using a generic rule, it must be up to date at any given time; what may get out of date can only be the lists of names of tests.

Update: The text above was correct in 2012. Since 2014, the deal.II testsuite is based on CTest, but the general idea remains valid.

edited Nov 13 '14 at 00:16

answered Dec 24 '12 at 21:41

Wolfgang Bangerth

55,373
59
119

Thanks! Here is the link to the docs: http://www.dealii.org/7.2.0/development/testsuite.html#regression_tests and here is the result of the "usual tests": http://www.dealii.org/cgi-bin/regression_quick.pl and here of the "expensive tests": http://www.dealii.org/cgi-bin/regression.pl, did I get it right? So you run "usual tests" on each revision and "expensive tests" only every couple of revisions? – Ondřej Čertík Dec 26 '12 at 16:56
Not quite. Results for expensive tests are not usually posted to the website. – Wolfgang Bangerth Dec 26 '12 at 18:34
So you run them manually let's say before each release? How long do they take to run? I like your approach. – Ondřej Čertík Dec 26 '12 at 22:21
1

Yes, manually. Every once in a while, before releases for sure. For some projects they take an hour or more but because they test only a small part of the library (they mostly test add-on projects to deal.II) it's not always worth or possible running them with every revision. – Wolfgang Bangerth Dec 27 '12 at 04:39
Do you have any parallel tests that can only run on supercomputers, for example, any large-scale tests against p4est? – Aron Ahmadia Jan 02 '13 at 22:03
We do have parallel tests but most of our tests are just for correctness and there, 10 processors is as good as 1,000 once you have identified the bug or test a particular feature. I don't think we have tests that we run with more than 10 MPI processes. (The tests are run on a machine with 16 cores, so even running with 100 MPI processes wouldn't be impossible.) – Wolfgang Bangerth Jan 02 '13 at 22:43

How to to easily reproduce published results in my own articles using my own code

1 Answers1