Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Host task implementation #1471

Merged
merged 195 commits into from
May 20, 2020

Conversation

s-kanaev
Copy link
Contributor

@s-kanaev s-kanaev commented Apr 3, 2020

This patch is part 1 in a series of patches for host-interop-task proposal by
Codeplay. See the proposal at [1].

This patch implements:

  • host-task execution mechanism;
  • enqueue of host-task without interop_handle argument;
  • spin-lock to await for host and synchronous device events completion.

This patch reimplements glue/connection of events within different contexts to
eliminate use of event callback in favor of host-task.

Host-task execution mechanism involves:

  • thread-pool for queue to execute host-task's user lambda in;
  • explicit call to event_impl::setComplete() for host events and device-side
    synchronous events;
  • helper class DispatchHostTask which wraps call to host-task's user lambda.

Thread pool's size is set via SYCL_QUEUE_THREAD_POOL_SIZE environment variable
and defaults to 1.

Even though host-task is enqueued to device queue it'll be executed on the
default host queue.
Host-task is represented via distinct ExecCGCommand paired with EmptyCommand.
Any other command, which depends on host-task will really depend on it's
EmptyCommand. The EmptyCommand is in blocked state initially.

Class DispatchHostTask awaits for host-task's dependency events, then calls to
host-task's user lambda and unblocks any dependent commands via unblocking it's
EmptyCommand and enqueueing of leaves for requirements (i.e. host accessors
required for execution of this host-task).

[1] https://github.com/codeplaysoftware/standards-proposals/blob/master/host_task/host_task.md

Sergey Kanaev added 30 commits March 4, 2020 14:42
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Ruyk
Ruyk previously approved these changes May 15, 2020
sycl/source/detail/scheduler/graph_builder.cpp Outdated Show resolved Hide resolved
sycl/source/detail/scheduler/graph_builder.cpp Outdated Show resolved Hide resolved
sycl/source/detail/scheduler/graph_builder.cpp Outdated Show resolved Hide resolved
sycl/source/detail/scheduler/graph_builder.cpp Outdated Show resolved Hide resolved
Sergey Kanaev added 2 commits May 18, 2020 17:28
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
@s-kanaev
Copy link
Contributor Author

@keryell Somehow, I can't re-request review with github's UI, so, please, review and approve.

@romanovvlad
Copy link
Contributor

@s-kanaev Please, provide a final commit message for this PR.

@s-kanaev
Copy link
Contributor Author

Please, provide a final commit message for this PR.

Done.

@bader bader merged commit ae3fd5c into intel:sycl May 20, 2020
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request May 20, 2020
* sycl: (65 commits)
  [SYCL] Host task implementation (intel#1471)
  [SYCL] Update getting dependencies documentation (intel#1699)
  [SYCL] Fix types and transparent functors recognition in reduction (intel#1709)
  [SYCL][Doc] Get started guide clean-up (intel#1697)
  Add --spirv-fp-contract={on|off|fast} option (intel#509)
  [SYCL][Doc] Fix tbb target path in Get Started Guide. (intel#1695)
  [SYCL] Add support for kernel name types templated using enums. (intel#1675)
  [Driver][SYCL] Make -std=c++17 the default for DPC++ (intel#1662)
  AllocaInst should store Align instead of MaybeAlign.
  [X86] Replace selectScalarSSELoad ComplexPattern with PatFrags to handle the 3 types of loads we currently match.
  Harden IR and bitcode parsers against infinite size types.
  Revert "[nfc] test commit"
  [nfc] test commit
  Expose IRGen API to add the default IR attributes to a function definition.
  The release notes for ObjCBreakBeforeNestedBlockParam was placed between the release note for IndentCaseBlocks and its example code
  [VectorCombine] forward walk through instructions to improve chaining of transforms
  [PhaseOrdering] add vector reduction tests; NFC
  [InstCombine] Clean up alignment handling (NFC)
  [ARM] Patterns for VQSHRN
  [VectorCombine] add reduction-like patterns; NFC
  ...
Ruyk referenced this pull request in codeplaysoftware/sycl-for-cuda May 21, 2020
Since introduction of host tasks in #1471,
`piEventCallback` and related functionality is not required
by the SYCL-RT.
Removing the implementation of this behaviour from the CUDA
backend simplifies the submission of operations to streams and
overall increases performance.

Signed-off-by: Ruyman Reyes <[email protected]>
bader pushed a commit that referenced this pull request May 29, 2020
Since introduction of host tasks in #1471,
`piEventCallback` and related functionality is not required
by the SYCL-RT.
Removing the implementation of this behaviour from the CUDA
backend simplifies the submission of operations to streams and
overall increases performance.

Signed-off-by: Ruyman Reyes <[email protected]>
Comment on lines +45 to +47
auto AccA = BufA.get_access<sycl::access::mode::read>(CGH);
auto AccB = BufB.get_access<sycl::access::mode::read>(CGH);
auto AccC = BufC.get_access<sycl::access::mode::read_write>(CGH);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to have host accessor, not device-one.

bader pushed a commit that referenced this pull request Aug 3, 2020
The tests provided here are for the fix introduced with #1471.

Use case fixed by the fix. Imagine us having two device queues (`Q1` and `Q2`) and a single buffer `B`. We initialize the buffer with host accessor. Also lets have two kernels: kernel `K1` executed at `Q1`; and kernel `K2` which executes at `Q2`. `K1` writes to `B`, `K2` reads from `B`. After submitting `K1` to `Q1` there will also be an `AllocaCommand` `A1` which allocates buffer on device. `K1` depends on `A1` via memory object `B`. After submitting `K2` to `Q2` there will be another `AllocaCommand` `A2` (for queue `Q2` and its device). `K2` will depend on `A2`. `A2`, however, should depend on both `A1` and `K1`. `A2->K1` dependency eliminates data race.

Signed-off-by: Sergey Kanaev <[email protected]>
@s-kanaev s-kanaev deleted the private/s-kanaev/event-callback branch September 2, 2020 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants