3

This question is a follow-up to Fortran: Best way to time sections of your code?.

If I want to time functions in my code, I know I could use gprof or kcachegrind. I also know that the results from these tools can be skewed (see http://www.yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html and https://stackoverflow.com/questions/1777556/alternatives-to-gprof/1779343#1779343).

I know I could add manual timers to each function for which I want data, which can be tedious or impractical for libraries, if I want data for everything.

Unfortunately, I run into communities that want this timing data to use as evidence in arguing for the performance their methods (to demonstrate improvement in performance, point out spots where performance is bad, for scientific papers, and so on). This seems to be popular with management-types and some academic-types. Is there a better way to get reliably accurate timing data than inserting timers? Should I be using a combination of imperfect tools and sifting through the performance data in some way?

(Note: This question isn't about performance tuning, even though it's related. You can do performance tuning without timing things by using random pausing. It also isn't about whether or not timing is worthwhile, because these communities want timing data, and I don't have the power to change their mind easily. Any comments about these topics are great discussion, but they're not helpful in answering my question, because the reality is that the people I answer to want timing data that somehow reflects performance.)

Geoff Oxberry
  • 30,394
  • 9
  • 64
  • 127
  • Hi Geoff. You can get fairly accurate time fraction if you can get a large number of stack samples, as you might with oprofile. The fraction of samples $f$ where your function appears gives its inclusive fraction. If you get $n$ samples, the standard error is $\sqrt{nf(1-f)}$. So for a 1% standard error, you need around 10^4 to 10^5 samples. – Mike Dunlavey Oct 17 '16 at 15:14

3 Answers3

3

You might consider a stack sampling profiler like HPCToolkit or VTune, or the system profiler for Linux, prof.

Also, I don't see what's objectionable about wanting to know how long things take. If you want to demonstrate that your implementation of an algorithm has the asymptotic performance you derived, actually measuring the running time is the best way to do so.

Bill Barth
  • 10,905
  • 1
  • 21
  • 39
  • I don't either, but the random pausing crowd likes to object to timing in favor of stack samples. – Geoff Oxberry Apr 15 '13 at 00:33
  • Isn't random pausing proto-stack-sampling? – Bill Barth Apr 15 '13 at 01:00
  • Never mind, I misread your comment. A better point would be that they may be objecting to timing because of the known issues with timing profilers. Explicit timing around routines of interest can't be objected to on these grounds, though you may be looking in the wrong place without good sampling. Depends on your purpose, I guess. – Bill Barth Apr 15 '13 at 01:11
  • @Geoff: Stack samples tell you percentages of total time, where total time itself is trivial to measure. If you want statistical precision of those percentages, you need lots of samples. Then if you want average-time-per-call you also need invocation counts of the suspect routines. That's how I would proceed.
  • – Mike Dunlavey Apr 19 '13 at 01:12
  • @Geoff: I forgot to mention Zoom. The bad part is it is not free software, and I've been admonished not to recommend it for that reason :) The good part is it may do what you need. – Mike Dunlavey Apr 19 '13 at 01:21
  • @BillBarth: Here's the long explanation. The short explanation is it depends on your objective, whether it's measuring the code or speeding up the code. When the goal is to speed up the code, applying full attention to a small number of maximally-informative samples finds speedup opportunities that are not found simply by making high-precision measurements. It depends on one's purpose. – Mike Dunlavey Apr 22 '13 at 15:27
  • @BillBarth: "Isn't random pausing proto-stack-sampling?" There's an observer-bias issue. Even good profilers tend to say there is no way to speed up the code when there actually is. So if that's what someone wants to hear, they will like it. On the other hand, if somebody really needs to squeeze cycles, each stack sample is rich in information about why a moment is spent, and seeing a nugget on two samples nails it, while mushing a large number of samples into statistics loses that explanatory information. It's quality vs. quantity. – Mike Dunlavey Aug 22 '13 at 15:19
  • @MikeDunlavey: If you have a good stack-sampling profiler, then you have all the data disaggregated and can look at it in that form and ignore the summaries. My experience with your random-pausing method was that it led me down a rabbit hole of a routine that wasn't all that important (less than 5% of time spent) because I got unlucky with a couple of pauses. Why not let the profiler do the pausing and then flip through its logs? – Bill Barth Aug 22 '13 at 16:50
  • @BillBarth: Can you flip through its logs, and see the actual raw samples (with line numbers preferably)? I've done that with the R profiler, and though it didn't have line numbers, it was still useful for the issues I had. I am curious about your experience. Usually 10-20 samples is unambiguous. It's a distribution - a low percent is possible, and so is a high percent. I have heard of people doing things like a) taking samples while it waits for user input, or b) not really looking at the stack, just the PC. Maybe it takes a bit of practice. – Mike Dunlavey Aug 22 '13 at 17:24
  • @MikeDunlavey: Off the top of my head, I don't know. I'd have to look into the data formats of HPCToolkit and VTune in order to figure it out. Both collect massive data files in my experience. – Bill Barth Aug 22 '13 at 19:04