Is busy waiting on both MPI_Iprobe and MPI_Testsome efficient?

Question

I have an MPI application that needs to asynchronously respond to both incoming messages and request completions inside a dedicated communication thread. The obvious way to do this is a busy wait that alternately calls MPI_Iprobe and MPI_Testsome. Should I be worried about performance if I do this? Is it better to just use two threads and eat the context switching time?

If an architecture-unspecific answer is unreasonable, I'm shooting for either BlueGene/Q or Cray XE6.

Unfortunately (in terms of MPI_Testsome costs), I expect to have O(100) requests active at a time.

Notes:

I'm memory constrained, so it's important to detect completed requests as soon as possible. A completed request might free up enough space to schedule more compute on other threads.
The incoming messages have varying sizes, and I don't have nearly enough memory to allocate buffers for all of them, so I can't switch to blanket MPI_Irecvs instead of the MPI_Iprobes.
It's a shame MPI doesn't let you do an MPI_Irecv with only an upper bound on the size of the incoming message, since this would solve my problem perfectly.

score 4 · Accepted Answer · edited Jul 23 '12 at 18:19

4

As Jeremiah W. mentioned in a note above, your "Note 3" actually is explicitly supported by MPI. You can always post a larger receive buffer than the message that you will actually be sent.

So this is fine:

if (rank == 0) {
    MPI_Request req;
    MPI_Status status;
    int num_received;
    MPI_Irecv(recvbuf, 100, MPI_DOUBLE, 1, 1234, MPI_COMM_WORLD, &req);
    MPI_Wait(&req, &status);
    MPI_Get_count(&status, MPI_DOUBLE, &num_received);
    /* num_received should now contain 5 */
}
else if (rank == 1) {
    MPI_Send(sendbuf, 5, MPI_DOUBLE, 0, 1234, MPI_COMM_WORLD);
}

edited Jul 23 '12 at 18:19

Jed Brown

25,650
3
72
130

answered Jul 23 '12 at 13:40

Dave Goodell

156
2

Great, I'm all set then. The only downside is it means I need to add safety assertions to all my other uses of Recv/Irecv to verify that sizes match. Important to know. – Geoffrey Irving Jul 23 '12 at 16:58

score 2 · Answer 2 · edited Jul 23 '12 at 02:34

2

If you could use MPI_Irecv with an upper bound on the message size, why not just send that number of bytes every time (i.e. pad messages)?

It's hard to say whether 1 thread doing Iprobe and Testsome is better or worse than a thread for each. This is going to be incredibly dependent on the usage.

How hard is it to implement both approaches and compare them in the wild?

edited Jul 23 '12 at 02:34

Jed Brown

25,650
3
72
130

answered Jul 23 '12 at 01:34

Jeff Hammond

2,116
16
22

I suppose in this case padding adds only around 10% extra overall bandwidth (80% of the messages hit the upper bound exactly), so it's certainly an option. I'll have to see whether I manage to be compute limited. Implementing both is definitely the best way if I get to it. – Geoffrey Irving Jul 23 '12 at 01:50
How big are these messages anyways? – Jeff Hammond Jul 23 '12 at 02:01
Most are 262144 bytes. – Geoffrey Irving Jul 23 '12 at 02:19
For context: the reason most of the messages hit the upper bound is that they're pieces of 4D blocked arrays. Only the boundary blocks don't have shape (8,8,8,8), and each element is 64 bytes. – Geoffrey Irving Jul 23 '12 at 02:31
1

Remember that the MPI_Irecv bound is just that -- an upper bound. You can send shorter messages (without padding), then use MPI_Get_count to find the actual number of elements sent. – Jeremiah Willcock Jul 23 '12 at 04:57
Great! The link which confirms that is (unsurprisingly) http://www.mpi-forum.org/docs/mpi22-report/node46.htm#Node46. Thanks for pointing this out. – Geoffrey Irving Jul 23 '12 at 16:56
I'm tempted to down-vote my own answer... – Jeff Hammond Feb 22 '14 at 16:42

score 0 · Answer 3 · answered Jul 25 '12 at 14:03

0

Just a further optimization to get rid of using an upper bound of the receive buffer (as described previously):

You are using MPI_Iprobe, that also returns an MPI_Status of your message. Why not use MPI_Get_count() with this status and the MPI_datatype to query the size of the message to receive and allocate a buffer of the exact size?

answered Jul 25 '12 at 14:03

Markus Blatt

175
1

1

The point was to avoid MPI_Iprobe. Eagerly posting a request for the incoming message allows replacing the nonblocking calls with blocking MPI_Waitany or MPI_Waitsome. – Jed Brown Jul 25 '12 at 16:24
1

Yep. And similarly the communication thread can post a receive from the same rank so that worker threads can wake up the blocked communication thread. – Geoffrey Irving Jul 25 '12 at 19:11
Both Cray and BG are going to spin. I'm not aware of any MPI implementation + OS pair that effectively implements sleep-while-blocking. @Dave Goodell might know. – Jeff Hammond Jul 14 '13 at 19:16

Is busy waiting on both MPI_Iprobe and MPI_Testsome efficient?

3 Answers3