stream: implement gpu stream enqueue functions #5906

hzhou · 2022-03-25T04:46:59Z

Pull Request Description

Implementing the gpu stream aware extension using the MPI Stream concept as discussed in #5908. This is a replacement implementation of #5905

Reference discussions on MPIX_Stream: #5908

EDIT:
Rebased and updated on top of #5946

[skip warnings]

Notes

The enqueue operation will work as long as there is no immediate GPU communication between the enqueue and synchronize. I assume this is the common case for applications. A proper implementation need isolate the traffic. We will do that after pt2pt: add MPID_Allocate_vci #5904
We need a way to pass binary type as info string. Maybe --
MPIX_Info_set_hex(info, key, &stream, sizeof(stream))
and a corresponding getter.
The current MPIR_Info has a NULL-key as first entry. Need fix. -- ref: PR info: use dynamic array in MPIR_Info #5922

Author Checklist

Provide Description
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form: module: short description
Commit message explains what's in the commit.
Passes All Tests
Whitespace checker. Warnings test. Additional tests via comments.
Contribution Agreement
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.

hzhou · 2022-03-31T13:37:41Z

test:mpich/ch4/gpu/ofi ✔️

hzhou · 2022-03-31T21:41:35Z

test:mpich/ch4/gpu/ofi ✔️

hzhou · 2022-04-01T19:31:05Z

Fixing the MPIX_Stream handling in the Fortran binding layer --
test:mpich/ch4/most
test:mpich/ch3/most

hzhou · 2022-04-01T22:25:55Z

test:mpich/ch3/tcp

hzhou · 2022-04-02T02:42:32Z

test:mpich/ch3/tcp

The PR has changed

hzhou · 2022-04-21T17:05:01Z

test:mpich/ch4/ofi ✔️

hzhou · 2022-04-25T19:21:39Z

test:mpich/ch4/gpu/ofi

hzhou · 2022-04-26T01:08:14Z

test:mpich/ch4/most

hzhou · 2022-04-26T01:08:33Z

test:mpich/ch3/most

hzhou · 2022-04-26T22:07:54Z

test:mpich/ch4/gpu/ofi
test:mpich/ch4/most
test:mpich/ch3/most

hzhou · 2022-04-27T04:45:05Z

test:mpich/ch4/gpu/ofi
test:mpich/ch4/most

hzhou · 2022-04-27T13:07:32Z

test:mpich/ch4/ofi
1 timeout - threads/pt2pt/mt_improbe_sendrecv_huge 2 -iter=64 -count=4194304 MPIR_CVAR_CH4_OFI_EAGER_MAX_MSG_SIZE=16384

hzhou · 2022-04-27T16:09:02Z

test:mpich/ch4/ofi

We only implemented the cuda version for now, which simply calls cudaLaunchHostFunc.

Add a wrapper to validate the raw gpu stream value so we can alert the user as early as we can.

Validate the gpu stream value right away in MPIX_Stream_create. If we go ahead with an invalid value, it will segfault later in cudaAPI in a rather mysterious way.

Add MPIR_Typerep_pack_stream and MPIR_Typerep_unpack_stream as wrappers for yaksa_pack_stream and yaksa_unpack_stream.

Only supported if the communicator has a cuda stream associated. Enqueues operation to the cuda stream.

It is often not critical for gpu streams to use separate vcis as long it is separated from normal traffic. In addition, it can be common for an application to use many GPU streams globally and we may not have enough dedicated vcis to match them. This commit default to reuse a single dedicated vci for gpu streams.

It tests the CUDA stream enqueue functions via MPIX Stream Communicator.

hzhou mentioned this pull request Mar 25, 2022

mpix: gpu stream aware extensions #5905

Closed

4 tasks

hzhou force-pushed the 2203_stream_2 branch 11 times, most recently from b8150a8 to f4a5202 Compare March 31, 2022 13:37

hzhou force-pushed the 2203_stream_2 branch from f4a5202 to 2c0a520 Compare March 31, 2022 21:18

hzhou requested a review from raffenet March 31, 2022 21:20

hzhou force-pushed the 2203_stream_2 branch 3 times, most recently from 3289032 to 2501347 Compare March 31, 2022 21:35

hzhou marked this pull request as ready for review March 31, 2022 21:42

hzhou force-pushed the 2203_stream_2 branch 3 times, most recently from 4d468f7 to 00ebf2b Compare April 1, 2022 14:28

raffenet previously approved these changes Apr 4, 2022

View reviewed changes

hzhou force-pushed the 2203_stream_2 branch from 00ebf2b to d97405d Compare April 20, 2022 21:27

hzhou changed the title ~~mpix: mpi stream prototype (experimental)~~ stream: implement gpu stream enqueue functions Apr 20, 2022

hzhou force-pushed the 2203_stream_2 branch 6 times, most recently from f7bedc1 to 1664db3 Compare April 25, 2022 19:21

hzhou added the ready-for-review label Apr 26, 2022

hzhou force-pushed the 2203_stream_2 branch from 1664db3 to a10e001 Compare April 26, 2022 22:07

hzhou requested a review from raffenet April 26, 2022 22:08

hzhou force-pushed the 2203_stream_2 branch from a10e001 to 5a466dc Compare April 27, 2022 04:44

raffenet approved these changes Apr 27, 2022

View reviewed changes

hzhou added 8 commits April 27, 2022 16:42

mpl: add MPL_gpu_launch_hostfn

7170ef3

We only implemented the cuda version for now, which simply calls cudaLaunchHostFunc.

mpl: add MPL_gpu_stream_is_valid

0c44158

Add a wrapper to validate the raw gpu stream value so we can alert the user as early as we can.

stream: better error handling for bad gpu stream infohint

4f3bfaa

Validate the gpu stream value right away in MPIX_Stream_create. If we go ahead with an invalid value, it will segfault later in cudaAPI in a rather mysterious way.

datatype/typerep: add MPIR_Typerep_pack_stream

236fa12

Add MPIR_Typerep_pack_stream and MPIR_Typerep_unpack_stream as wrappers for yaksa_pack_stream and yaksa_unpack_stream.

stream: add MPIX_Send/Recv_enqueue

81cd1a8

Only supported if the communicator has a cuda stream associated. Enqueues operation to the cuda stream.

stream: add MPIX_I{send,recv}_enqueue and MPIX_Wait{all}_enqueue

2a7b140

test: add test/mpi/impls/mpich/cuda/stream.cu

24e304f

It tests the CUDA stream enqueue functions via MPIX Stream Communicator.

hzhou force-pushed the 2203_stream_2 branch from 5a466dc to 24e304f Compare April 27, 2022 21:42

hzhou merged commit f39d5ee into pmodels:main Apr 27, 2022

hzhou deleted the 2203_stream_2 branch April 27, 2022 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stream: implement gpu stream enqueue functions #5906

stream: implement gpu stream enqueue functions #5906

hzhou commented Mar 25, 2022 •

edited

Loading

hzhou commented Mar 31, 2022 •

edited

Loading

hzhou commented Mar 31, 2022 •

edited

Loading

hzhou commented Apr 1, 2022

hzhou commented Apr 1, 2022

hzhou commented Apr 2, 2022

hzhou commented Apr 21, 2022 •

edited

Loading

hzhou commented Apr 25, 2022

hzhou commented Apr 26, 2022

hzhou commented Apr 26, 2022

hzhou commented Apr 26, 2022

hzhou commented Apr 27, 2022

hzhou commented Apr 27, 2022 •

edited

Loading

hzhou commented Apr 27, 2022

stream: implement gpu stream enqueue functions #5906

stream: implement gpu stream enqueue functions #5906

Conversation

hzhou commented Mar 25, 2022 • edited Loading

Pull Request Description

Notes

Author Checklist

hzhou commented Mar 31, 2022 • edited Loading

hzhou commented Mar 31, 2022 • edited Loading

hzhou commented Apr 1, 2022

hzhou commented Apr 1, 2022

hzhou commented Apr 2, 2022

hzhou commented Apr 21, 2022 • edited Loading

hzhou commented Apr 25, 2022

hzhou commented Apr 26, 2022

hzhou commented Apr 26, 2022

hzhou commented Apr 26, 2022

hzhou commented Apr 27, 2022

hzhou commented Apr 27, 2022 • edited Loading

hzhou commented Apr 27, 2022

hzhou commented Mar 25, 2022 •

edited

Loading

hzhou commented Mar 31, 2022 •

edited

Loading

hzhou commented Mar 31, 2022 •

edited

Loading

hzhou commented Apr 21, 2022 •

edited

Loading

hzhou commented Apr 27, 2022 •

edited

Loading