4

In this code, all processes post a barrier, sleep a while for good measure, then first Test and then Wait for the barrier. The Test says no and the wait succeeds. A test should be like a non-blocking wait, so it should succeed. What am I missing?

  printf("[%d] posting barrier, then sleeping\n",procid);
  MPI_Request final_barrier;
  MPI_Ibarrier(comm,&final_barrier);
  sleep(2);

  int all_done_flag=0;
  MPI_Test(&final_barrier,&all_done_flag,MPI_STATUS_IGNORE);
  printf("[%d] all done: %d\n",procid,all_done_flag);

  MPI_Wait(&final_barrier,MPI_STATUS_IGNORE);
  printf("[%d] concluded\n",procid);
Victor Eijkhout
  • 1,330
  • 10
  • 13
  • If I just do two MPI_Test's in a row instead of one, the second one succeeds. Some googling led me to Why does MPI_Iprobe return false when message has definitely been sent (4 years old now), could the answer there be it? – Kirill Jun 05 '18 at 17:48
  • Do you do the two tests both before the wait? I tried two tests followed by a wait, and then the second test is still not true.

    A test after the wait will succeed (that's what your link is doing: an Iprobe after a blocking Probe) but that's not interesting to me.

    – Victor Eijkhout Jun 05 '18 at 18:12
  • I used https://gist.github.com/ikirill/30118fc521144e0e7d71a066261cbdde. In that answer (not the question) it seems to me they used a busy-loop with only an Iprobe inside. – Kirill Jun 05 '18 at 18:22
  • So part of the answer is that Test is a local operation: does only local testing without any interaction with other processes, while Wait is a non-local operation. Another aspect of the problem is "progress": MPI needs to be activated every once in a while for everything to progress as it should. – Victor Eijkhout Jun 13 '18 at 13:34

0 Answers0