Types gt::bfloat16_t and gt::complex_bfloat16_t #283
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The types
gt::bfloat16_t
andgt::complex_bfloat16_t
are provided as 1:1 copies ofgt::float16_t
andgt::complex_float16_t
, respectively.The 1:1 adaptions from
[complex_]float16_t.h
-->[complex_]bfloat16_t.h
are:cuda_fp16.h
-->cuda_bf16.h
__half
-->__nv_bfloat16
CUDA_ARCH >= 530
-->CUDA_ARCH >= 800
GTENSOR_ENABLE_FP16
-->GTENSOR_ENABLE_BF16
Similarly,
test_[complex_]bfloat16_t.cxx
is a copy oftest_[complex]float16_t.cxx
with any instance offloat16_t
replaced bybfloat16_t
.Tested on A100, built with modules
gcc/11
andcuda/11.4
loaded as well as-DGTENSOR_DEVICE=cuda -DGTENSOR_ENABLE_BF16=ON -DCMAKE_CUDA_ARCHITECTURES=80
added to the cmake call.Note: Obviously, it would be cleaner to have a class templated on
storage_type
andcompute_type
covering both,gt::float16_t
andgt::bfloat16_t
, at the same time (and possibly also a futuregt::float8_t
). However, as this is only a temporary solution until NVIDIA supports this within thrust, it might be good enough. Any opinions on this?Any feedback is highly appreciated!