Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add faster reduction implementations using atomic or/and intel… #1615

Merged
merged 1 commit into from
May 1, 2020

Conversation

v-klochkov
Copy link
Contributor

…::reduce()

Signed-off-by: Vyacheslav N Klochkov [email protected]

@romanovvlad
Copy link
Contributor

@Pennycook could you review the patch?

@@ -761,6 +785,48 @@ class __SYCL_EXPORT handler {
#endif
}

/// Implements parallel_for() accepting nd_range and 1 reduction variable
/// having 'read_write' access mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Suggest mentioning that reduction should support "fast atomics".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vlad, doesn't this comment already tell that (see the line 3 and 4 of this comment section)?

/// Implements parallel_for() accepting nd_range and 1 reduction variable
/// having 'read_write' access mode.
/// This version uses fast sycl::atomic operations to update user's reduction
/// variable at the end of each work-group work.

Copy link
Contributor

@Pennycook Pennycook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the names fast_atomic and fast_reduce going forward -- what we're really testing here is whether SYCL provides native atomics or reductions for those types (since a device is not required to guarantee that their implementation is fast).

I don't feel strongly enough about this to block merging this PR, but we might want to revisit this naming convention when tuning for additional platforms.

@v-klochkov
Copy link
Contributor Author

I'm not sure about the names fast_atomic and fast_reduce going forward -- what we're really testing here is whether SYCL provides native atomics or reductions for those types (since a device is not required to guarantee that their implementation is fast).

I don't feel strongly enough about this to block merging this PR, but we might want to revisit this naming convention when tuning for additional platforms.

If for some device those 'native' atomics happen to work slowly, then the good move is to exclude such 'native' atomic from 'fast' atomics list and use different algorithm not using them. Such exclusion would require some additional changes and maybe dynamic/runtime checks on HOST (I did not think much about it yet).
Taking this idea into account using the word 'fast' seems reasonable, right?

@Pennycook
Copy link
Contributor

If for some device those 'native' atomics happen to work slowly, then the good move is to exclude such 'native' atomic from 'fast' atomics list and use different algorithm not using them. Such exclusion would require some additional changes and maybe dynamic/runtime checks on HOST (I did not think much about it yet).
Taking this idea into account using the word 'fast' seems reasonable, right?

That's a good point. I agree -- as long as we continue to update the logic and use these specializations only if the features are expected to improve performance, "fast" is a good name.

Copy link
Contributor

@alexbatashev alexbatashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader bader merged commit 05625f1 into intel:sycl May 1, 2020
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request May 6, 2020
…_docs

* origin/sycl: (6482 commits)
  [SYCL][NFC] Clean formatting in Markdown documents (intel#1635)
  [SYCL][Doc] Remove obsolete parens from README (intel#1637)
  [SYCL] Fix failing ABI tests when LLVM_LIBDIR_SUFFIX is set (intel#1605)
  [SYCL] Fix warnings in libdevice (intel#1630)
  [SYCL][CUDA] Triage and clean LIT (intel#1620)
  [SYCL][NFC] Fix GCC 8 compilation warnings (intel#1631)
  [SYCL] Minor fixes in LowerWGScope
  [SYCL] PI: correct default interoperability plugin selection
  [SYCL] Add faster reduction implementations using atomic or/and intel::reduce() (intel#1615)
  [SYCL] Add sycl-ls utility for listing devices discovered/selected by SYCL RT (intel#1575)
  [SYCL] Fix getDeviceFromHandler declarations (intel#1626)
  [SPIR-V] Correct/improve declaration of SPIR-V builtins (intel#1519)
  [SYCL][USM] Improve USM allocator test and fix improper behavior. (intel#1538)
  [SYCL] Fix failing ABI LITs (intel#1622)
  [SYCL] Add support for MSVC internal math functions in device library (intel#1441)
  [SYCL] Add runtime library versioning (intel#1604)
  [SYCL] Check weak symbols in ABI dumps (intel#1609)
  [NFC][SYCL] Improve kernel metadata test (intel#1610)
  Revert "[SYCL] XFAIL LIT test due to duplicate diagnostic" (intel#1460)
  [SYCL] Move the reduction command group funcs out of handler.hpp (intel#1602)
  ...
@v-klochkov v-klochkov deleted the public_vklochkov_reduction_p3 branch May 8, 2020 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants