Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nonblocking handles with RMA requests #53

Draft
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

jeffhammond
Copy link
Member

@jeffhammond jeffhammond commented Oct 3, 2024

This was a long-standing omission in the implementation. ARMCI nonblocking handles are similar to MPI RMA requests but are not 1:1 because aggregate request handles are 1:N.

This implements request handles using RMA requests, which replaces the prior implementation that just did flush(_all) instead of individual handle completion. The old implementation is preserved via the preprocessor.

This also adds a feature to switch to Rget_accumulate for atomics (all of which are blocking), which avoids a flush in this code path that might be slowed down by the need to complete more expensive, potentially non-hardware, operations.

This has not been tested thoroughly. It will be merged after sufficient testing.

Tested with:

  • MPICH 4.2 Ch4 OFI in shared memory
  • MPICH 4.2 Ch3 in shared memory.
  • Open MPI 4.x in shared memory
  • Cray MPI on LUMI
  • HPC-X (Open MPI 4) on Mellanox IB
  • Open MPI 5 on Mellanox IB
  • MVAPICH on Mellanox IB
  • MPICH UCX on Mellanox IB
  • MPICH OFI on Mellanox IB

Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
with the separate memory model, there are scenarios where we want/need to synchronize the public and private window after all processes have completed ARMCI_AllFence.  ideally, GA calls ARMCI_Barrier in GA_Sync but since that isn't the case, we will add an option to call ARMCII_Sync (which calls MPI_Win_sync on all windows) inside of armci_msg_barrier (and similar), since we know that GA always calls this after it fences.

Signed-off-by: Jeff Hammond <[email protected]>
Fetch_and_op or Compare_and_swap plus Flush(_local) might be more expensive
so we add an option to use Rget_accumulate (yes, way more arguments)
and wait on the resulting request, which might be better in some cases.
Signed-off-by: Jeff Hammond <[email protected]>
no implementation of request-based RMA yet...

Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
this is not working for nonblocking vector ops, which fails in armci-test.
all other tests pass, at least in shared-memory.

Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
running NWChem generates a huge number of assertions/warnings about bogus handles.
it would seem that GA does a bad job of initializing these.

Signed-off-by: Jeff Hammond <[email protected]>
ARMCII_Warning was called before ARMCI_GROUP_WORLD was initialized, so warnings in init were printed by every rank.

Signed-off-by: Jeff Hammond <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant