Predict runtimes for dense linear algebra

Question

I would like to predict runtimes for dense linear algebra operations on a specific architecture using a specific library. I would like to learn a model that approximates the function

$F_{op} \;::\; $input sizes$ \rightarrow $runtime

for operations like matrix-multiply, element-wise add, triangular solve, etc....

I suspect that these runtimes are mostly predictable due to the regularity of the operations once you get beyond problem sizes that fit comfortably in cache.

Questions:

Is this assumption realistic? Is the runtime function likely to be nearly deterministic?
Can I assume that this function will be polynomial in the sizes of the inputs? (i.e. I expect dense matrix multiply to look something like $\alpha n\times k\times m$ for $A_{nk}\times B_{km}$ and $\alpha$ some scalar coefficient)
Is there preexisting work on this somewhere?
My current plan is to do least squares regression with an $L_1$ regularizer. Any other suggestions?

Edit: To be clear I'm looking for runtimes, not FLOPs or any other common performance metric. I'm willing to restrict myself to one particular architecture.

score 10 · Answer 1 · answered Sep 26 '12 at 21:02

10

I have recently been working on exactly this topic. You may want to take a look at our paper: http://arxiv.org/abs/1209.2364.

Why are you interested in the runtime prediction of linear algebra routines? Do you intend to use the model for a certain purpose?

answered Sep 26 '12 at 21:02

Elmar Peise

201
2
2

Thanks for the link. I'll take a look. I'm interested in this for I suspect the same reason you are. Automated algorithm selection and scheduling for matrix expressions. A lot of otherwise impossible problems should be possible in this highly regular and predictable domain. – MRocklin Sep 26 '12 at 21:37

Bill Barth · Answer 2 · 2012-06-13T02:42:27.033

6

There is lots of preexisting work. Most linear algebra library developers publish performance results in terms of floating-point performance which can be converted into run times.

Googling for "DGEMM performance" for example, yields the following: http://math-atlas.sourceforge.net/timing/3_5_10/index.html.

Generally, you can expect the answers to be non-smooth. There will be jumps or spikes in the vicinity of certain problem sizes (which relate to cache sizes). You should also expect plateaus in rates, and, therefore, linear-ish regions for a broad range of problem sizes. I don't expect polynomial fits to be very helpful.

Given a broad-based benchmarking effort, it might be easier to tabulate results and interpolate as necessary. What's your goal?

edited Jun 13 '12 at 02:42

answered Jun 13 '12 at 02:34

Bill Barth

10,905
1
21
39

1

A flop/s plateau in DGEMM indicates an $n^3$ region because that's the rate the flops are growing with problem size. I agree that a piecewise fit should be much better than trying to fit a single polynomial. – Jed Brown Jun 13 '12 at 03:42
Converting flops to runtimes is, in my experience, difficult. I really only care about runtimes in my case. I'm testing the feasibility of static scheduling. – MRocklin Jun 13 '12 at 12:27
In my experience spikes/plateaus only occur for small problem sizes. Once you get out beyond the cache things are pretty smooth. I agree that adding in piecewise functions would probably improve the fit. – MRocklin Jun 13 '12 at 12:29

Predict runtimes for dense linear algebra

2 Answers2