Types gt::bfloat16_t and gt::complex_bfloat16_t #283

cmpfeil · 2023-11-22T14:27:31Z

The types gt::bfloat16_t and gt::complex_bfloat16_t are provided as 1:1 copies of gt::float16_t and gt::complex_float16_t, respectively.

The 1:1 adaptions from [complex_]float16_t.h --> [complex_]bfloat16_t.h are:

cuda_fp16.h --> cuda_bf16.h
__half --> __nv_bfloat16
CUDA_ARCH >= 530 --> CUDA_ARCH >= 800
GTENSOR_ENABLE_FP16 --> GTENSOR_ENABLE_BF16

Similarly, test_[complex_]bfloat16_t.cxx is a copy of test_[complex]float16_t.cxx with any instance of float16_t replaced by bfloat16_t.

Tested on A100, built with modules gcc/11 and cuda/11.4 loaded as well as -DGTENSOR_DEVICE=cuda -DGTENSOR_ENABLE_BF16=ON -DCMAKE_CUDA_ARCHITECTURES=80 added to the cmake call.

Note: Obviously, it would be cleaner to have a class templated on storage_type and compute_type covering both, gt::float16_t and gt::bfloat16_t, at the same time (and possibly also a future gt::float8_t). However, as this is only a temporary solution until NVIDIA supports this within thrust, it might be good enough. Any opinions on this?

Any feedback is highly appreciated!

+ Is a 1:1 clone of gt::[complex_]float16_t + Only adaptions are * cuda_fp16.h --> cuda_bf16.h * __half --> __nv_bfloat16 * CUDA_ARCH >= 530 --> CUDA_ARCH >= 800 * GTENSOR_ENABLE_FP16 --> GTENSOR_ENABLE_BF16

bd4 · 2023-11-27T19:59:10Z

This looks fine to me. Some day we might have official clean support for all these types in CUDA/HIP/SYCL, but in the meantime just having something that works is very useful and I am not too concerned about making it pretty.

germasch

Other than the _ for private member variables, looks good to me. That is probably the case for float16_t as well (haven't checked), so could be updated in a subsequent PR.

The other comment I have is that it may be worth checking whether the duplication of the tests could be avoided (googletest does supported templated tests, though we're generally not taking advantage of that, but we probably should...)

germasch · 2023-11-29T15:16:07Z

include/gtensor/bfloat16_t.h

+private:
+  storage_type x;


We generally use an underscore suffix for private member variables, so x_ here would be appreciated, as it serves as a hint that something is a member variable when it's used within class member functions, and it arguably makes x(x) in the constructor a bit less confusing.

cmpfeil added 5 commits November 21, 2023 17:19

[fix] Make type aliases local to float16_t

2d6b8e3

[feat/test/build] Add gt::[complex_]bfloat16_t

7c951d3

+ Is a 1:1 clone of gt::[complex_]float16_t + Only adaptions are * cuda_fp16.h --> cuda_bf16.h * __half --> __nv_bfloat16 * CUDA_ARCH >= 530 --> CUDA_ARCH >= 800 * GTENSOR_ENABLE_FP16 --> GTENSOR_ENABLE_BF16

[clean] Apply clang-format

a377008

[clean] Apply more clang-format

399f0af

[clean] Apply more clang-format

d0cfe6a

cmpfeil marked this pull request as ready for review November 28, 2023 09:02

germasch approved these changes Nov 29, 2023

View reviewed changes

germasch merged commit 99f3fa4 into wdmapp:main Nov 29, 2023
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Types gt::bfloat16_t and gt::complex_bfloat16_t #283

Types gt::bfloat16_t and gt::complex_bfloat16_t #283

cmpfeil commented Nov 22, 2023

bd4 commented Nov 27, 2023

germasch left a comment

germasch Nov 29, 2023

		private:
		storage_type x;

Types gt::bfloat16_t and gt::complex_bfloat16_t #283

Types gt::bfloat16_t and gt::complex_bfloat16_t #283

Conversation

cmpfeil commented Nov 22, 2023

bd4 commented Nov 27, 2023

germasch left a comment

Choose a reason for hiding this comment

germasch Nov 29, 2023

Choose a reason for hiding this comment