Optimize GPU boundary exchanges via NVSHMEM #1

bentsherman · 2021-09-14T19:54:30Z

NVSHMEM is an implementation of OpenSHMEM for Nvidia GPUs:

https://developer.nvidia.com/nvshmem
https://docs.nvidia.com/hpc-sdk/nvshmem/api/docs/index.html

It is essentially an alternative to MPI that allows the GPUs to communicate directly with the interconnect, instead of going through the CPU for MPI communications. The API is very similar to MPI but with slightly different terminology (init, finalize, PEs, teams, put/get, collective ops). Additionally, the memory model is slightly different.

This would be a great way to optimize the boundary exchanges, which currently represent the majority of communication overhead in the multi-GPU scenario. A big downside is that you probably can't have MPI and NSHMEM in the same program. You might be able to have a wrapper library that defers to either MPI or NVSHMEM based on whether or not GPUs are enabled, but more likely you will need to have separate binaries for cpu/gpu.

bentsherman · 2021-09-14T23:08:50Z

This repo has code examples for all the different ways to implement a Jacobi solver with multi-GPU:

https://github.com/NVIDIA/multi-gpu-programming-models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize GPU boundary exchanges via NVSHMEM #1

Optimize GPU boundary exchanges via NVSHMEM #1

bentsherman commented Sep 14, 2021

bentsherman commented Sep 14, 2021

Optimize GPU boundary exchanges via NVSHMEM #1

Optimize GPU boundary exchanges via NVSHMEM #1

Comments

bentsherman commented Sep 14, 2021

bentsherman commented Sep 14, 2021