I have an MPI application that needs to asynchronously respond to both incoming messages and request completions inside a dedicated communication thread. The obvious way to do this is a busy wait that alternately calls MPI_Iprobe and MPI_Testsome. Should I be worried about performance if I do this? Is it better to just use two threads and eat the context switching time?
If an architecture-unspecific answer is unreasonable, I'm shooting for either BlueGene/Q or Cray XE6.
Unfortunately (in terms of MPI_Testsome costs), I expect to have O(100) requests active at a time.
Notes:
- I'm memory constrained, so it's important to detect completed requests as soon as possible. A completed request might free up enough space to schedule more compute on other threads.
- The incoming messages have varying sizes, and I don't have nearly enough memory to allocate buffers for all of them, so I can't switch to blanket MPI_Irecvs instead of the MPI_Iprobes.
- It's a shame MPI doesn't let you do an MPI_Irecv with only an upper bound on the size of the incoming message, since this would solve my problem perfectly.