Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR I solve a bug in distributed reduce operations that re-use the send and receive buffers across different
MPI_Reduce
calls. This bug was first introduced with #191.When performing the locality-aware reduce in a local leader that is not co-located with the rank that originated the reduce, we perform the following operations:
As the local leader is not expected to be receiving anything, it may be the case that, the
recvBuffer
in the method's signature is anullptr
(seems to be an unwritten agreement across MPI developers). Therefore, in (2) we were accumulating the partial reduce in the same buffer the local leader's data was provided (i.e.sendBuffer
), and then sending this buffer in (3).Unofrtunately, said
sendBuffer
is a pointer to the application's memory, and the application is not expecting that memory to be modified. In particular, if subsequent reduces are performed, the results returned by reduce are incorrect.