Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Move bfloat support from experimental to supported. #6524

Merged
merged 99 commits into from
Nov 28, 2022

Conversation

rdeodhar
Copy link
Contributor

@rdeodhar rdeodhar commented Aug 3, 2022

This change makes bfloat16 a supported feature.

Signed-off-by: Rajiv Deodhar [email protected]

@JackAKirk
Copy link
Contributor

Looks good, do we want to also move these bfloat16 math functions out of experimental also:

std::enable_if_t<std::is_same<T, bfloat16>::value, T> fmin(T x, T y) {

since they are defined in the same extension document as the main bfloat16 class?

btw there should be an accompanying PR to intel/llvm-test-suite updating corresponding tests, otherwise there will be lots of failures: For example in this test: https://github.com/intel/llvm-test-suite/blob/intel/SYCL/BFloat16/bfloat16_type.hpp.

@JackAKirk
Copy link
Contributor

Also FYI there is another open PR updating the bfloat16 class that it might be good to consider merging first #6492.

@rdeodhar
Copy link
Contributor Author

rdeodhar commented Aug 4, 2022

/verify with intel/llvm-test-suite#1129

@rdeodhar rdeodhar marked this pull request as ready for review August 9, 2022 16:43
@rdeodhar rdeodhar requested review from a team as code owners August 9, 2022 16:43
@rdeodhar rdeodhar requested a review from steffenlarsen August 9, 2022 16:43
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, though I think the comment should be addressed first.

Also, should we consider leaving a deprecated version of bfloat16.hpp in the experimental folder to warn old users? Since the feature was experimental I don't think we strictly have to, but we could.

@@ -408,4 +406,5 @@ Compute absolute value of a `bfloat16`.
|3|2021-08-18|Alexey Sotkin |Remove `uint16_t` constructor
|4|2022-03-07|Aidan Belton and Jack Kirk |Switch from Intel vendor specific to oneapi
|5|2022-04-05|Jack Kirk | Added section for bfloat16 math builtins
|6|2022-08-03|Alexey Sotkin |Add `operator sycl::half()`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be your name here. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is carried over from that author, but I agree that more changes have been made, so changing name.

@gmlueck
Copy link
Contributor

gmlueck commented Aug 23, 2022

I also have two global comments:

operator float() const;
operator sycl::half() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this conversion to sycl::half. However, we should also add the opposite conversion from sycl::half to bfloat16:

bfloat16(const sycl::half &a);
bfloat16 &operator=(const sycl::half &a);

Do we also need conversion to / from double?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is intended to move the current bfloat16 support out of experimental space. Any changes to the level of bfloat16 support can be done in future PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Intel platforms the bfloat16 to/from float is done using the __spirv_ConvertBF16ToFINTELoperator. I suspect a double version of that does not exist.
Float to double conversion can be made in the usual C++ way more efficiently in hardware. A direct version of bfloat16 to double conversion in software will involve more bit twiddling than the float conversion where only trailing 0 bits of fraction need to be inserted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sycl::half class includes conversions to/from float. Those kick in when bfloat16 is used with sycl::half, so conversions between bfloat16 and sycl::half are not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that we should remove this conversion from bfloat16 to sycl::half?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This item was revisited and it turns out that sycl::half <-> bfloat16 conversions are needed. They have been added.

Copy link
Contributor

@MrSidims MrSidims Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for joining the discussion late. May be it's a nitpick, but should we tell, that conversion half <-> bfloat16 follows IEEE 754 float <-> half conversion? In other words, what happens, if bfloat16 value overflows half range? Also are we adding last 3 fraction bits stochastically or they are guarantied to be zero (or it's implementation detail)?

@rdeodhar
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1129

@rdeodhar
Copy link
Contributor Author

/run with intel/llvm-test-suite#1129

@rdeodhar
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1129

@rdeodhar
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1129

@rdeodhar rdeodhar requested a review from gmlueck August 26, 2022 23:02
Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking a lot better, just a few more comments below.

@rdeodhar rdeodhar requested a review from gmlueck August 31, 2022 00:15
@rdeodhar
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1129

Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor doc comments below. I think the main remaining issue is that the aspect needs to be enabled.

@JackAKirk
Copy link
Contributor

Precommit CI seems to be failing after this as well. See for example #7563.

Summary of current failures:

1. esimd/regression/windows_build_test.cpp in Windows post-commit - Addressed in [[SYCL] Remove experimental from bfloat16 in Windows test #7569](https://github.com/intel/llvm/pull/7569).

2. extensions/bfloat16.cpp in no-assert Linux - No current PRs addressing this.

3. ESIMD test-suite failures in CI - Addressed in [[SYCL] Correct bfloat16 namespace in ESIMD and matrix tests llvm-test-suite#1422](https://github.com/intel/llvm-test-suite/pull/1422).

4. CUDA test-suite compilation failures - Addressed in [[SYCL] Reintroduce experimental bfloat16 math functions #7567](https://github.com/intel/llvm/pull/7567) and [[SYCL] Fix CUDA tests using bfloat16 llvm-test-suite#1421](https://github.com/intel/llvm-test-suite/pull/1421).

5. `PI_ERROR_INVALID_BINARY` in test-suite bfloat16_type.cpp on CUDA - No current PRs addressing this.

Line 5 of bloat16_type.cpp looks like it is missing the -fsycl-targets=%sycl_triple ?:

// RUN: %clangxx -fsycl %s -o %t.out

@steffenlarsen
Copy link
Contributor

Line 5 of bloat16_type.cpp looks like it is missing the -fsycl-targets=%sycl_triple ?:

// RUN: %clangxx -fsycl %s -o %t.out

You're absolutely right, it seems like we have a separate CUDA test for this since it has additional requirements, so I think is better to just keep these separate for now. I have opened a PR for this in intel/llvm-test-suite#1423.

@pvchupin
Copy link
Contributor

@steffenlarsen should we revert this change?

pvchupin pushed a commit to intel/llvm-test-suite that referenced this pull request Nov 29, 2022
intel/llvm#6524 moved bfloat16 out of the
experimental namespace. This commit removes the last remaining uses
of the experimental namespace in bfloat16 for ESIMD and matrix tests.

Signed-off-by: Larsen, Steffen <[email protected]>
@steffenlarsen
Copy link
Contributor

@steffenlarsen should we revert this change?

Tests seem to be resolvable, but in order to make sure we need the next nightly. Once all the mentioned patches are merged we should only have a check-sycl failure on no-assert to also address, which should only affect post-commit so I suggest we keep it in unless other issues pop up.

@pvchupin
Copy link
Contributor

@steffenlarsen, thanks a lot for handling bunch of these!!!
ping @rdeodhar for remaining one.

@JackAKirk
Copy link
Contributor

@steffenlarsen, thanks a lot for handling bunch of these!!! ping @rdeodhar for remaining one.

@steffenlarsen has also dealt with this:

" 5. PI_ERROR_INVALID_BINARY in test-suite bfloat16_type.cpp on CUDA - No current PRs addressing this.

"

in
intel/llvm-test-suite#1423

pvchupin pushed a commit that referenced this pull request Nov 30, 2022
#6524 accidentally removed the
experimental bfloat16 math functions while moving bfloat16 out of the
experimental namespace. This commit reintroduces these in the
bfloat16_math.hpp header file.

Changes to sub_group.hpp are to resolve detail namespace ambiguities are
are NFC.

Signed-off-by: Larsen, Steffen <[email protected]>
pvchupin pushed a commit to pvchupin/llvm that referenced this pull request Dec 1, 2022
Test was modified at intel#6524

Change fixes post-commit issue in no-asserts mode
steffenlarsen pushed a commit that referenced this pull request Dec 1, 2022
Test was modified at #6524

Change fixes post-commit issue in no-asserts mode
yubingex007-a11y added a commit to yubingex007-a11y/llvm that referenced this pull request Jan 3, 2023
bader pushed a commit that referenced this pull request Jan 3, 2023
v-klochkov added a commit to v-klochkov/llvm that referenced this pull request Jan 11, 2023
…ntel#9143)

The error in LIT test esimd/intel_fp16_converts.cpp is caused
by intel#6524 which
- moved 'bfloat16' out of 'experimental' namespace
- created a wrapper __devicelib_ConvertBF16ToFINTEL() which simply calls __spirv_ConvertBF16ToFINTEL()

The fix
- create a test for bfloat16 conversions.
- allows __devicelib_ConvertBF16ToFINTEL() and __devicelib_ConvertFToBF16INTEL() for ESIMD context.

Signed-off-by: Vyacheslav N Klochkov <[email protected]>
v-klochkov added a commit that referenced this pull request Jan 13, 2023
#7981)

…9143)

The error in LIT test esimd/intel_fp16_converts.cpp is caused by
#6524 which
- moved 'bfloat16' out of 'experimental' namespace
- created a wrapper __devicelib_ConvertBF16ToFINTEL() which simply calls
__spirv_ConvertBF16ToFINTEL()

The fix
- creates a test for bfloat16 conversions.
- allows __devicelib_ConvertBF16ToFINTEL() and
__devicelib_ConvertFToBF16INTEL() for ESIMD context.

Signed-off-by: Vyacheslav N Klochkov <[email protected]>
steffenlarsen pushed a commit that referenced this pull request Feb 21, 2023
…YCL (#8257)

This PR addresses an issue where if we use `__CUDA_ARCH__` causes
intrinsics not to be defined in the CUDA include files.
- Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device
- Update the `sycl-macro.cpp` test to check the appropriate macro.

---

As far as I could find the original issue was introduced from PR
[#6524](7b47ebb)
for enabling the bfloat16 support moving it from the experimental
extension, and it breaks some codebases with CUDA interop calls.
Current reports include github issues
[#7722](#7722),
[#8133](#8133) and
[uxlfoundation/oneMath#257](uxlfoundation/oneMath#257).

For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it
instead for SYCL device targets and leave `__CUDA_ARCH__` as before for
CUDA targets.
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Feb 23, 2023
…ntal status. (intel#1129)

Tests changes for intel#6524

Signed-off-by: Rajiv Deodhar <[email protected]>
Co-authored-by: JackAKirk <[email protected]>
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…ntal status. (intel/llvm-test-suite#1129)

Tests changes for intel#6524

Signed-off-by: Rajiv Deodhar <[email protected]>
Co-authored-by: JackAKirk <[email protected]>
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…vm-test-suite#1422)

intel#6524 moved bfloat16 out of the
experimental namespace. This commit removes the last remaining uses
of the experimental namespace in bfloat16 for ESIMD and matrix tests.

Signed-off-by: Larsen, Steffen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.