I am doing an exercise using MPI to count frequencies of words distributed in several different files following similar steps in this instruction.
But I met a problem with step 2. In my implementation, I first sent out locally counted word-count pairs into corresponding processors according to the words' hashcode. At the same time, each processor might need to receive word-count pairs sented from other processors. As a result, one processor will have to both send and receive.
For example:
In processor 1, there are word-count pairs:
cat: 3
dog: 4
fox: 2
In processor 2, there are word-count pairs:
deer: 2
cat: 2
fox: 1
red: 3
and so on. Suppose the word "cat" is computed to be sent to processor 2, and the words "deer", "fox", "red" are supposed to be sent to processor 1. So the pseudocode for each processor would be:
for each word-count pair:
compute the word hash code
MPI_Issend it to corresponding processor according to the hash code
while (true):
MPI_Recv word-count pair
combine the counts of the same word in this processor
MPI_Waitall
Do some other things
Note that the second loop will never stop, so I need to add a termination condition. My way of implementing this is to send a signal from the manager processor with rank 0 to each worker processor. So the second loop becomes:
while (true):
MPI_Recv word-count pair
determine sender from the receive status
if (sender != manager)
combine the counts of the same word in this processor
else
break
But this algorithm turned out to be not stable. Although it will get the intended result in most runs, it can enter into deadlock. My guess for the failure is that some workers may receive the termination signal from the manager too early before receiving all the word-count pairs from other worker processors, so the corresponding MPI_Issend from those workers would have to wait indefinitely, thus a deadlock.
I wish someone could give some suggestions of improvement on my implementation. Or if someone have better algorithms for this step, that would also be appreciated! Thank you for your attention!
UPDATE:
In order to prevent the manager from sending termination signals before any workers finish receiving word-count pairs, I summed up the total number of words from all processors involved in the shuffle stage, say $N$. Then, every time a processor receives a word-count pair, it should also send a signal to the manager. Accordingly, in the manager part, a loop is added to Recv the these signals, so that the manager will only send the termination signals when all the word-count pairs are received by workers. In this way, the manager signal will never interrupt with content sent from workers and thus the recv loop in the worker can terminate correctly. Below is the final modified pseudocode for this:
Manager Part:
for (i = 1 : N)
MPI_Recv signal with tag "recvd"
MPI_Send termination signal to all processors involved in the shuffle stage
Worker Part:
for each word-count pair:
compute the word hash code
MPI_Issend it to corresponding processor according to the hash code
while (true):
MPI_Recv word-count pair
**MPI_Send signal with tag "recvd" to notify the manager that a wcp is received**
if (sender != manager)
combine the counts of the same word in this processor
else
break
MPI_Waitall
Do some other things
Although this algorithm worked successfully, I'm not sure whether I'm on the right track. Am I over complexing this problem?
while(true)loop that has nobreakor other exit statement. – Wolfgang Bangerth Jul 21 '17 at 20:38