11

The netlib BLAS implementation is an excellent reference, being mostly un-optimized and well documented (e.g. zgemm). However, it is in Fortran 77, making it somewhat inaccessible to those with a more modern programming education. Is there a reference-level implementation of BLAS, like netlib, in C/C++?

Max Hutchinson
  • 3,051
  • 16
  • 29

6 Answers6

10

Have you looked at GNU Scientific Library's implementation? I find the source code to be sufficiently readable and the routines are well documented.

Juan M. Bello-Rivas
  • 3,994
  • 16
  • 24
  • Looks good to me. The documentation is a bit lacking, but the variable names are chosen well enough that I think its clear. I'll probably prepend the opening comment of the netlib BLAS routines. What exactly do you take issue with? Do you have an alternative? – Max Hutchinson Oct 28 '13 at 10:21
6

A notable, C language implementation of BLAS is ATLAS. Among useful features:

  1. Algebra routines implemented both as straightforward C as well as highly-optimized assembler assisted versions for multiple architectures and variants.
  2. The build system features an "auto-tuner" which compiles multiple variants of the ATLAS library to establish which one will be the fastest on the given machine.

http://math-atlas.sourceforge.net/

oakad
  • 161
  • 1
  • I looked at ATLAS but missed this. The path to the reference implementation is "src/blas/reference", with "ref" inserted between the type character and routine name and with character arguments appended. – Max Hutchinson Oct 28 '13 at 03:04
3

Netlib also produces CLAPACK, which includes BLAS, but it is just the fortran code run through f2c and is therefore a bit clunky (e.g. zgemm).

Max Hutchinson
  • 3,051
  • 16
  • 29
1

For a high-performance implementation that is not only among the highest performing (better than 85% of peak on 60 cores of the Intel Xeon Phi), but is also imho the most beautifully written, have a look at BLIS:

https://github.com/flame/blis

Juan M. Bello-Rivas
  • 3,994
  • 16
  • 24
0

I have implementations of some bits of BLAS/LAPACK in RNP and RNP2.

Victor Liu
  • 4,480
  • 18
  • 28
  • This is interesting, but it is definitely post-BLAS and less straight forward than GSL and the reference implementation in ATLAS. – Max Hutchinson Oct 28 '13 at 03:13
-1

We are currently working on a Massive Open Online Course, "LAFF-On High-Performance Computing" that uses dgemm as the example that leads one through different levels of parallelization: instruction level, OpenMP, MPI.

This is not a reference implementation for the BLAS, but it is a reference for how to code the BLAS (for performance). To be kept informed, visit www.ulaff.net