1

I need to synchronize intermediate solutions of an optimization problem solved distributively over a number of worker processors. The solution vector is known to be sparse.

I have noticed that if I use MPI_AllReduce, the performance is good compared to my own AllReduce implementation.

However, I believe, the performance can be further improved if AllReduce could communicate only the nonzero entries in the solution vector. I could not find any such implementation of AllReduce.

Any ideas?

It seems that MPI_type_indexed can not be used as the indices of the nonzero entries are not known in advance.

Soumitra
  • 11
  • 2
  • Why not measure the performance of your code and see if this particular reduction operation matters at all? This seems like premature optimization to me. – Bill Barth Dec 16 '15 at 01:56
  • If you use derived data types, you have to roll your own reduction operators, which will disable all of the optimizations that go into most MPI implementations. – Jeff Hammond Dec 23 '15 at 03:42
  • Thanks Bill for your suggestion. It seems that the solution vector is 1,355,191 entries 60-70% of them are zero. The MPI communication is the main bottleneck. – Soumitra Dec 28 '15 at 22:30
  • Yes Jeff that what I am worried about implementing my own. Thanks. – Soumitra Dec 28 '15 at 22:31

1 Answers1

2

I think you may create another vector to store all the non-zero elements in the solution vector. Then use MPI_ALLReduce.

ztdep
  • 186
  • 6