Added CUDA-aware MPI support detection for MVAPICH, MPICH and ParaStationMPI #522

Markus-Goetz · 2020-04-02T13:10:37Z

Description

Added support for CUDA-aware MPI for the other three supporting MPI platforms - MVAPICH, MPICH and ParaStationMPI.

NOTE: if your MPI installation is compiled with CUDA-aware MPI support it may still not use it by default. For most of them a specific environment variable needs to be set our code checks for.

Issue/s resolved: #438

Type of change

Remove irrelevant options:

New feature (non-breaking change which adds functionality)

Due Diligence

Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

…tionMPI

codecov · 2020-04-02T13:16:51Z

Codecov Report

Merging #522 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #522   +/-   ##
=======================================
  Coverage   96.59%   96.60%           
=======================================
  Files          68       68           
  Lines       14089    14092    +3     
=======================================
+ Hits        13609    13613    +4     
+ Misses        480      479    -1

Impacted Files	Coverage Δ
heat/core/types.py	`94.73% <ø> (ø)`
heat/core/communication.py	`89.29% <100.00%> (+0.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b533215...1e4913b. Read the comment docs.

krajsek

Have to tested the different options?

Markus-Goetz · 2020-04-02T14:21:16Z

I tested MVAPICH, seems to work fine on our cluster. Parastation has been tested on HDFML, it results in a permission denied error that Alex Strube is currently working on, but it definitely changes the behaviour of the MPI stack. I was unable to test MPICH as there is non available. However it is the canonical way according to their docs.

coquelin77 · 2020-04-03T06:59:09Z

heat/core/communication.py

 # check whether OpenMPI support CUDA-aware MPI
 if "openmpi" in os.environ.get("MPI_SUFFIX", "").lower():
    buffer = subprocess.check_output(["ompi_info", "--parsable", "--all"])
    CUDA_AWARE_MPI = b"mpi_built_with_cuda_support:value:true" in buffer
-else:
-    CUDA_AWARE_MPI = False
+# MVAPICH


is there any way to automatically get this? or even better, to automatically turn on cuda mpi?

Well the modules could set the respective environment variable on load. Programatically from Heat there is no way (that I know of) in telling whether the binaries are actually compiled with CUDA support. There is the hacky possibility of checking ldd output and looking whether the MPI libs attempt to dynamically load the CUDA shared objects.

Added CUDA-aware MPI support detection for MVAPICH, MPICH and ParaSta…

1b64f84

…tionMPI

Markus-Goetz requested review from krajsek, coquelin77 and Cdebus April 2, 2020 13:10

Update communication.py

1de3ef2

krajsek reviewed Apr 2, 2020

View reviewed changes

coquelin77 reviewed Apr 3, 2020

View reviewed changes

Update CHANGELOG.md

1e4913b

coquelin77 approved these changes Apr 3, 2020

View reviewed changes

coquelin77 merged commit 2cec7a2 into master Apr 3, 2020

coquelin77 deleted the features/438-cuda-aware-mpi branch April 3, 2020 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added CUDA-aware MPI support detection for MVAPICH, MPICH and ParaStationMPI #522

Added CUDA-aware MPI support detection for MVAPICH, MPICH and ParaStationMPI #522

Markus-Goetz commented Apr 2, 2020

codecov bot commented Apr 2, 2020 •

edited

Loading

krajsek left a comment

Markus-Goetz commented Apr 2, 2020 •

edited

Loading

coquelin77 Apr 3, 2020

Markus-Goetz Apr 3, 2020

Added CUDA-aware MPI support detection for MVAPICH, MPICH and ParaStationMPI #522

Added CUDA-aware MPI support detection for MVAPICH, MPICH and ParaStationMPI #522

Conversation

Markus-Goetz commented Apr 2, 2020

Description

Type of change

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

codecov bot commented Apr 2, 2020 • edited Loading

Codecov Report

krajsek left a comment

Choose a reason for hiding this comment

Markus-Goetz commented Apr 2, 2020 • edited Loading

coquelin77 Apr 3, 2020

Choose a reason for hiding this comment

Markus-Goetz Apr 3, 2020

Choose a reason for hiding this comment

codecov bot commented Apr 2, 2020 •

edited

Loading

Markus-Goetz commented Apr 2, 2020 •

edited

Loading