Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable gpu-aware MPI by default #121

Merged
merged 14 commits into from
Oct 1, 2022
Merged

Conversation

BenWibking
Copy link
Collaborator

@BenWibking BenWibking commented Sep 28, 2022

It turns out GPU-aware MPI works great, as long as managed memory is disabled and all CUDA devices are visible from each MPI process. These conditions mean that with OpenMPI, the cuda_ipc transport will be used.

On-node scaling is now ~99% efficient on Delta, Gadi.

Note: this will need to be re-tested for Setonix and Frontier.

For AMD devices, see also: https://docs.amd.com/bundle/AMD-Instinct-MI250-High-Performance-Computing-and-Tuning-Guide-v5.3/page/GPU-Enabled_MPI.html

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

BenWibking commented Sep 28, 2022

For the HydroBlast3D problem (with max_grid_size set to $256^3$), we get:

1 A100: 217.6 Mupdates/s ( $256^3$ problem)
4 A100s / 1 node: 864.4 Mupdates/s ( $512^3$ problem)

for a scaling efficiency of >99%.

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

BenWibking commented Sep 29, 2022

GPU tests are failing because fextract does not work if MultiFabs are not accessible from the host.

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@BenWibking BenWibking merged commit 851cbdd into development Oct 1, 2022
@BenWibking BenWibking deleted the use-cuda-ipc-default branch October 1, 2022 16:37
BenWibking added a commit that referenced this pull request Oct 3, 2022
* enable gpu-aware MPI by default

* updated README for GPU-aware MPI

* updated batch scripts

* change fextract to use device-allocated data

* copy reference solution to device when needed

* interpolate.h -> interpolate.hpp

* initialize on device

* avoid LoopConcurrentOnCpu
BenWibking added a commit that referenced this pull request Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant