Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream: implement gpu stream enqueue functions #5906

Merged
merged 8 commits into from
Apr 27, 2022
Merged

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Mar 25, 2022

Pull Request Description

Implementing the gpu stream aware extension using the MPI Stream concept as discussed in #5908. This is a replacement implementation of #5905

Reference discussions on MPIX_Stream: #5908

EDIT:
Rebased and updated on top of #5946

[skip warnings]

Notes

  • The enqueue operation will work as long as there is no immediate GPU communication between the enqueue and synchronize. I assume this is the common case for applications. A proper implementation need isolate the traffic. We will do that after pt2pt: add MPID_Allocate_vci #5904

  • We need a way to pass binary type as info string. Maybe --
    MPIX_Info_set_hex(info, key, &stream, sizeof(stream))
    and a corresponding getter.

  • The current MPIR_Info has a NULL-key as first entry. Need fix. -- ref: PR info: use dynamic array in MPIR_Info #5922

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou hzhou mentioned this pull request Mar 25, 2022
4 tasks
@hzhou hzhou force-pushed the 2203_stream_2 branch 11 times, most recently from b8150a8 to f4a5202 Compare March 31, 2022 13:37
@hzhou
Copy link
Contributor Author

hzhou commented Mar 31, 2022

test:mpich/ch4/gpu/ofi ✔️

@hzhou hzhou requested a review from raffenet March 31, 2022 21:20
@hzhou hzhou force-pushed the 2203_stream_2 branch 3 times, most recently from 3289032 to 2501347 Compare March 31, 2022 21:35
@hzhou
Copy link
Contributor Author

hzhou commented Mar 31, 2022

test:mpich/ch4/gpu/ofi ✔️

@hzhou hzhou marked this pull request as ready for review March 31, 2022 21:42
@hzhou hzhou force-pushed the 2203_stream_2 branch 3 times, most recently from 4d468f7 to 00ebf2b Compare April 1, 2022 14:28
@hzhou
Copy link
Contributor Author

hzhou commented Apr 1, 2022

Fixing the MPIX_Stream handling in the Fortran binding layer --
test:mpich/ch4/most
test:mpich/ch3/most

@hzhou
Copy link
Contributor Author

hzhou commented Apr 1, 2022

test:mpich/ch3/tcp

1 similar comment
@hzhou
Copy link
Contributor Author

hzhou commented Apr 2, 2022

test:mpich/ch3/tcp

raffenet
raffenet previously approved these changes Apr 4, 2022
@hzhou hzhou changed the title mpix: mpi stream prototype (experimental) stream: implement gpu stream enqueue functions Apr 20, 2022
@hzhou hzhou dismissed raffenet’s stale review April 20, 2022 21:33

The PR has changed

@hzhou
Copy link
Contributor Author

hzhou commented Apr 21, 2022

test:mpich/ch4/ofi ✔️

@hzhou hzhou force-pushed the 2203_stream_2 branch 6 times, most recently from f7bedc1 to 1664db3 Compare April 25, 2022 19:21
@hzhou
Copy link
Contributor Author

hzhou commented Apr 25, 2022

test:mpich/ch4/gpu/ofi

@hzhou
Copy link
Contributor Author

hzhou commented Apr 26, 2022

test:mpich/ch4/most

@hzhou
Copy link
Contributor Author

hzhou commented Apr 26, 2022

test:mpich/ch3/most

@hzhou
Copy link
Contributor Author

hzhou commented Apr 26, 2022

test:mpich/ch4/gpu/ofi
test:mpich/ch4/most
test:mpich/ch3/most

@hzhou hzhou requested a review from raffenet April 26, 2022 22:08
@hzhou
Copy link
Contributor Author

hzhou commented Apr 27, 2022

test:mpich/ch4/gpu/ofi
test:mpich/ch4/most

@hzhou
Copy link
Contributor Author

hzhou commented Apr 27, 2022

test:mpich/ch4/ofi
1 timeout - threads/pt2pt/mt_improbe_sendrecv_huge 2 -iter=64 -count=4194304 MPIR_CVAR_CH4_OFI_EAGER_MAX_MSG_SIZE=16384

@hzhou
Copy link
Contributor Author

hzhou commented Apr 27, 2022

test:mpich/ch4/ofi

hzhou added 8 commits April 27, 2022 16:42
We only implemented the cuda version for now, which simply calls
cudaLaunchHostFunc.
Add a wrapper to validate the raw gpu stream value so we can alert the
user as early as we can.
Validate the gpu stream value right away in MPIX_Stream_create. If we
go ahead with an invalid value, it will segfault later in cudaAPI in a
rather mysterious way.
Add MPIR_Typerep_pack_stream and MPIR_Typerep_unpack_stream as wrappers
for yaksa_pack_stream and yaksa_unpack_stream.
Only supported if the communicator has a cuda stream associated.
Enqueues operation to the cuda stream.
It is often not critical for gpu streams to use separate vcis as long it
is separated from normal traffic. In addition, it can be common for an
application to use many GPU streams globally and we may not have enough
dedicated vcis to match them. This commit default to reuse a single
dedicated vci for gpu streams.
It tests the CUDA stream enqueue functions via MPIX Stream Communicator.
@hzhou hzhou merged commit f39d5ee into pmodels:main Apr 27, 2022
@hzhou hzhou deleted the 2203_stream_2 branch April 27, 2022 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants