-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stream: implement gpu stream enqueue functions #5906
Conversation
b8150a8
to
f4a5202
Compare
test:mpich/ch4/gpu/ofi ✔️ |
3289032
to
2501347
Compare
test:mpich/ch4/gpu/ofi ✔️ |
4d468f7
to
00ebf2b
Compare
Fixing the |
test:mpich/ch3/tcp |
1 similar comment
test:mpich/ch3/tcp |
test:mpich/ch4/ofi ✔️ |
f7bedc1
to
1664db3
Compare
test:mpich/ch4/gpu/ofi |
test:mpich/ch4/most |
test:mpich/ch3/most |
test:mpich/ch4/gpu/ofi |
test:mpich/ch4/gpu/ofi |
test:mpich/ch4/ofi |
test:mpich/ch4/ofi |
We only implemented the cuda version for now, which simply calls cudaLaunchHostFunc.
Add a wrapper to validate the raw gpu stream value so we can alert the user as early as we can.
Validate the gpu stream value right away in MPIX_Stream_create. If we go ahead with an invalid value, it will segfault later in cudaAPI in a rather mysterious way.
Add MPIR_Typerep_pack_stream and MPIR_Typerep_unpack_stream as wrappers for yaksa_pack_stream and yaksa_unpack_stream.
Only supported if the communicator has a cuda stream associated. Enqueues operation to the cuda stream.
It is often not critical for gpu streams to use separate vcis as long it is separated from normal traffic. In addition, it can be common for an application to use many GPU streams globally and we may not have enough dedicated vcis to match them. This commit default to reuse a single dedicated vci for gpu streams.
It tests the CUDA stream enqueue functions via MPIX Stream Communicator.
Pull Request Description
Implementing the gpu stream aware extension using the MPI Stream concept as discussed in #5908. This is a replacement implementation of #5905
Reference discussions on
MPIX_Stream
: #5908EDIT:
Rebased and updated on top of #5946
[skip warnings]
Notes
The enqueue operation will work as long as there is no immediate GPU communication between the enqueue and synchronize. I assume this is the common case for applications. A proper implementation need isolate the traffic. We will do that after pt2pt: add MPID_Allocate_vci #5904
We need a way to pass binary type as info string. Maybe --
MPIX_Info_set_hex(info, key, &stream, sizeof(stream))
and a corresponding getter.
The current
MPIR_Info
has aNULL
-key as first entry. Need fix. -- ref: PR info: use dynamic array in MPIR_Info #5922Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.