-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added CUDA-aware MPI support detection for MVAPICH, MPICH and ParaStationMPI #522
Conversation
Codecov Report
@@ Coverage Diff @@
## master #522 +/- ##
=======================================
Coverage 96.59% 96.60%
=======================================
Files 68 68
Lines 14089 14092 +3
=======================================
+ Hits 13609 13613 +4
+ Misses 480 479 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to tested the different options?
I tested MVAPICH, seems to work fine on our cluster. Parastation has been tested on HDFML, it results in a permission denied error that Alex Strube is currently working on, but it definitely changes the behaviour of the MPI stack. I was unable to test MPICH as there is non available. However it is the canonical way according to their docs. |
# check whether OpenMPI support CUDA-aware MPI | ||
if "openmpi" in os.environ.get("MPI_SUFFIX", "").lower(): | ||
buffer = subprocess.check_output(["ompi_info", "--parsable", "--all"]) | ||
CUDA_AWARE_MPI = b"mpi_built_with_cuda_support:value:true" in buffer | ||
else: | ||
CUDA_AWARE_MPI = False | ||
# MVAPICH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any way to automatically get this? or even better, to automatically turn on cuda mpi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the modules could set the respective environment variable on load. Programatically from Heat there is no way (that I know of) in telling whether the binaries are actually compiled with CUDA support. There is the hacky possibility of checking ldd output and looking whether the MPI libs attempt to dynamically load the CUDA shared objects.
Description
Added support for CUDA-aware MPI for the other three supporting MPI platforms - MVAPICH, MPICH and ParaStationMPI.
NOTE: if your MPI installation is compiled with CUDA-aware MPI support it may still not use it by default. For most of them a specific environment variable needs to be set our code checks for.
Issue/s resolved: #438
Type of change
Remove irrelevant options:
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
no