I have a MPI computation with the following structure: each processor has a large region of read-only memory divided into chunks. During a compute epoch, each processor performs a (different) number of steps of the form:
- Gather several chunks of data different processors.
- Do some computation and store the results.
Different steps can take wildly different amounts of time between different processors, although the total load is well balanced.
This structure seems almost perfect for one-sided communication using MPI_Get. However, I do not see how to fit this structure into any of the available synchronization models, at least using a single window. Here are the options:
MPI_Win_fence: doesn't work since different processors execute different numbers of steps.
MPI_Win_start/complete/post/wait: doesn't work since it requires a 1-1 match between access epochs on origin processes and exposure epochs on target processes. In other words, the origins need to synchronize multiple times incoherently between processes, so the targets can't choose a single consistent set of exposure epochs.
MPI_Win_lock/unlock: doesn't work since you can't only target one rank at a time.
I also don't see any solution using multiple windows, since I believe windows aren't allowed to overlap, and here multiple origins are frequently accessing the same chunks of memory.
Note that this situation is easy to handle using MPI_Isend, MPI_Irecv, and MPI_Wait_all, as long as the target processes poll or have a communication thread that responds to requests. When a process wants to gather data, it sends off a bunch of MPI_Isend requests, posts a corresponding number of MPI_Irecv's to accept the data, and does an MPI_Wait_all until the set of data is gathered.
Is there a way around to make one-sided communication work here?
Update: This seems related: http://www.cs.berkeley.edu/~bonachea/upc/mpi2.html