Simulating the MPI_Isend/Irecv/Wait model with one-sided communication

Question

I have a MPI computation with the following structure: each processor has a large region of read-only memory divided into chunks. During a compute epoch, each processor performs a (different) number of steps of the form:

Gather several chunks of data different processors.
Do some computation and store the results.

Different steps can take wildly different amounts of time between different processors, although the total load is well balanced.

This structure seems almost perfect for one-sided communication using MPI_Get. However, I do not see how to fit this structure into any of the available synchronization models, at least using a single window. Here are the options:

MPI_Win_fence: doesn't work since different processors execute different numbers of steps.
MPI_Win_start/complete/post/wait: doesn't work since it requires a 1-1 match between access epochs on origin processes and exposure epochs on target processes. In other words, the origins need to synchronize multiple times incoherently between processes, so the targets can't choose a single consistent set of exposure epochs.
MPI_Win_lock/unlock: doesn't work since you can't only target one rank at a time.

I also don't see any solution using multiple windows, since I believe windows aren't allowed to overlap, and here multiple origins are frequently accessing the same chunks of memory.

Note that this situation is easy to handle using MPI_Isend, MPI_Irecv, and MPI_Wait_all, as long as the target processes poll or have a communication thread that responds to requests. When a process wants to gather data, it sends off a bunch of MPI_Isend requests, posts a corresponding number of MPI_Irecv's to accept the data, and does an MPI_Wait_all until the set of data is gathered.

Is there a way around to make one-sided communication work here?

Update: This seems related: http://www.cs.berkeley.edu/~bonachea/upc/mpi2.html

score 5 · Answer 1 · answered Jul 23 '12 at 00:02

5

Overlapping windows are permitted in MPI-2. In fact, the RMA working group actually believes that this is a sufficiently reasonable usage that they refused better mechanisms for switching between usage modes (via window attributes).

It is unrelated, but I'm the person who got request-based RMA into MPI-3. And I had to fight like hell for them.

answered Jul 23 '12 at 00:02

Jeff Hammond

2,116
16
22

Thanks for championing it! What were the main concerns against? – Geoffrey Irving Jul 23 '12 at 01:21
People didn't see a use case until I came up with one. And there was a believe that the performance benefit was small relative to bulk completion. (This is a revised comment; I deleted my original one.) – Jeff Hammond Apr 04 '14 at 01:36

score 4 · Accepted Answer · answered Jul 17 '12 at 11:55

Indeed, passive target RMA is over-synchronizing in MPI-2 because only one MPI_Win_lock can be active at a given time, thus only one target can be accessed in an access epoch. The MPI-3 draft standard adds

int MPI_Win_lock_all(int assert, MPI_Win win);

which does what you want. There are many new RMA routines and generally more permissive semantics for MPI-3. A particular routine that might be useful is

int MPI_Rget(void *origin_addr, int origin_count,
             MPI_Datatype origin_datatype, int target_rank,
             MPI_Aint target_disp, int target_count,
             MPI_Datatype target_datatype, MPI_Win win,
             MPI_Request *request);

The MPI_Request allows completion without a new access epoch.

If you can get the targets to participate, then you could implement everything over point-to-point with all possible target ranks calling MPI_Iprobe periodically and servicing any requests. Since this does not use RMA, it will be higher latency on some systems, but completion ordering is more flexible.

Yep, MPI_Rget is exactly what I want. Great that it's being improved. For now I'll do MPI_Iprobe and see whether I can manage to hide the latency. — Geoffrey Irving, Jul 17 '12 at 14:56

Simulating the MPI_Isend/Irecv/Wait model with one-sided communication

2 Answers2