Is there an implementation of MPI_AllReduce which handles sparse data better?

Question

I need to synchronize intermediate solutions of an optimization problem solved distributively over a number of worker processors. The solution vector is known to be sparse.

I have noticed that if I use MPI_AllReduce, the performance is good compared to my own AllReduce implementation.

However, I believe, the performance can be further improved if AllReduce could communicate only the nonzero entries in the solution vector. I could not find any such implementation of AllReduce.

Any ideas?

It seems that MPI_type_indexed can not be used as the indices of the nonzero entries are not known in advance.

Why not measure the performance of your code and see if this particular reduction operation matters at all? This seems like premature optimization to me. — Bill Barth, Dec 16 '15 at 01:56
If you use derived data types, you have to roll your own reduction operators, which will disable all of the optimizations that go into most MPI implementations. — Jeff Hammond, Dec 23 '15 at 03:42
Thanks Bill for your suggestion. It seems that the solution vector is 1,355,191 entries 60-70% of them are zero. The MPI communication is the main bottleneck. — Soumitra, Dec 28 '15 at 22:30
Yes Jeff that what I am worried about implementing my own. Thanks. — Soumitra, Dec 28 '15 at 22:31

score 2 · Answer 1 · answered Aug 11 '18 at 13:20

2

I think you may create another vector to store all the non-zero elements in the solution vector. Then use MPI_ALLReduce.

answered Aug 11 '18 at 13:20

ztdep

186
6

Is there an implementation of MPI_AllReduce which handles sparse data better?

1 Answers1