Exploiting parallelism within a single heterogeneous computing node

Question

I'm looking for a library to exploit parallelism within a single heterogeneous computing node (possibly using Accelerators like Xeon Phi or nVidia's GPGPU's) in a C++ FV/DG code using hierarchical octree-like grids. It should

support multiple back-ends (e.g. OpenCL, CUDA, OpenMP, OpenACC, ...)
hopefully be generic enough to support back-ends from the future,
be easy to install/configure,
be easy to use.

Linear algebra would be nice, but the library should at least be able to do a simple transform with a user defined kernel on a computing device:

auto vd = device_vector<double>{ 11., 22., 33., 44. };
transform(vd, begin(vd), [](double vd_i){ return 2. * vd_i; });
host_vector<double> vh = vd;  // no-op if the device is the CPU
for (auto vh_i&& : vh) { cout << vh_i << "\n"; } // 22, 44, 66, 88

I've looked at Intel TBB, openMP, openACC, AMD's bolt, and nVidia's Thrust.

Thrust seems to be the best fit for my application because:

it provides different backends: CUDA, TBB, and OpenMP (no OpenCL),
it has a familiar STL-like interface: host/device containers, iterators, and algorithms,
the documentation seems nice.

However, I have no experience at all (and don't know anyone who has) building an hybrid MPI-Thrust application.

So to my question:

Is there any other library worth looking into that might fit my needs better?
Does anyone has experience with hybrid MPI-Thrust applications that can comment on how good of a fit Thrust is for such a thing?

score 4 · Answer 1 · answered Oct 12 '13 at 00:34

4

I suggest you take a look at ViennaCL. It's written in c++, leverages template meta programming, and supports OpenCL, CUDA, and OpenMP as backends. Just like Thrust, it has abstraction layers for device and host containers and implements several algorithms, mostly in the area of linear algebra. Although, device integer containers are missing as of version 1.4.2 (apparently they will be included in 1.5.0)

Moreover, what I like about it is that it nicely wraps around OpenCL API and simplifies the task of writing custom OpenCL kernels. I have not used ViennaCL or Thrust in a hybrid environment so I cannot comment on that. However, I know PETSc has interface to ViennaCL and supports hybrid mpi-viennacl vec types. This should simplify the hybrid approach -- Its actually one of my future projects to utilize this feature.

All in all, I recommend you take a look at their webpage. Oh, I almsot forgot to mention that ViennaCL is very nicely documented :).

answered Oct 12 '13 at 00:34

mmirzadeh

1,435
1
10
17

Hey, thanks for the tip! I just took a look at it but wasn't able to find how to do a simple transform with an user-defined kernel. Ideally I'd like to call a transform on a device vector and pass the kernel as a lambda function. The closest thing I could find was to provide the kernel as an array of characters.. but then this is only supported for the OpenCL backend which kinds of kill the advantage of having multiple backends.. and then there is the fact that the code written in the string will not be checked by the compiler... – gnzlbg Oct 14 '13 at 14:16
I've added a minimal example to the question. ViennaCL seems great for linear algebra, but I don't know if it is usable for anything else yet. I hope they keep working on it. – gnzlbg Oct 14 '13 at 14:28
@gnzlbg humm ... I'm not an expert in OpenCL, but I'm not sure what you are asking is even possible. AFAIK, unlike CUDA, OpenCL kernels are compiled at runtime so I doubt if there is a way around using strings for kernels.
That said, check out chapter 11 of the manual describing how to generate automated kernels. I'm not sure if that would work with CUDA backends though .... Karl Rupp, creator of ViennaCL, is here so maybe he could enlighten us on this :)
– mmirzadeh Oct 14 '13 at 18:27
1

I've taken a look at OpenCL now and you are right. To convert a lambda to an OpenCL kernel one needs special compiler support (e.g. like C++ AMP extensions)... – gnzlbg Oct 15 '13 at 09:10
@gnzlbg I know you asked for multiple backend support, but if you can live just with OpenCL, there is also VexCL. I have not worked with it myself but looks very similar to ViennaCL in many regards. It appress to support lambdas through Boost.Phoenix – mmirzadeh Oct 15 '13 at 09:38

Exploiting parallelism within a single heterogeneous computing node

1 Answers1