[ROCm] BFloat16 support #10416

ytaous · 2022-01-27T22:42:50Z

Description: Enable BFloat16 for ROCm
So far in this PR these are enabled ops

ReduceSum
Binary elementwise ops (Add/Sub/Mul/Div)
Cast
Softmax

Will continue to add support per code refactor from #10085

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.

ytaous · 2022-01-27T22:46:19Z

@iK1D - I found the fix for ReduceSum perf is not in ROCm (reduction_ops.cc) :
#9471
I wonder if it's applicable and I could apply the code change manually either in this PR or separate one.

centwang · 2022-01-28T02:43:05Z

onnxruntime/core/providers/rocm/reduction/reduction_ops.cc

@@ -823,6 +823,111 @@ SPECIALIZED_REDUCEKERNEL_COMPUTEIMPL(int64_t)
 SPECIALIZED_REDUCEKERNEL_COMPUTEIMPL(int8_t)
 SPECIALIZED_REDUCEKERNEL_COMPUTEIMPL(uint8_t)

+template <>
+template <>
+Status ReduceKernel<true>::ComputeImpl<BFloat16, MIOPEN_REDUCE_TENSOR_NO_INDICES>(


Why do we need a new func for BFloat16? Can the default one take the job? Since the default one can handle MLFloat16, and BFloat16 is just similar to MFloat16. #WontFix

Getting lib loading error if I remove the block, will leave it as is for now.

I read the code again. Current BFloat16 reduce will cast data to float for the calculation and cast back again at the end. I think cudnn should already support BFloat16 directly since CUDA11, for better perf we sould go to the default ComputeImpl, but to make it work we need to fix some more places. Let me open a new PR to do that.

oh, ok, thanks a lot

ytaous · 2022-01-28T04:38:52Z

I'll check with Suffian

In reply to: 1023710835

ytaous · 2022-01-29T01:42:21Z

@iK1D - let me speed up the BF16 support by applying similar change from your PR.
#10085

We can work on code clean up + UTs as needed in next phase.

reducesum bf16 support

be2ebf0

ytaous added the training issues related to ONNX Runtime training; typically submitted using template label Jan 27, 2022

ytaous requested review from centwang and weixingzhang January 27, 2022 22:42

bf16 for add/sub/mul/div

27c97f2

ytaous changed the title ~~[ROCm] BFloat16 support for ReduceSum op~~ [ROCm] BFloat16 support for ReduceSum, Add, Sub, Mul, Div op Jan 28, 2022

Merge branch 'master' into ettao/bf16-2

481ed7e

centwang reviewed Jan 28, 2022

View reviewed changes

fix build

a2234df

bf16 for Cast

7887ede

ytaous changed the title ~~[ROCm] BFloat16 support for ReduceSum, Add, Sub, Mul, Div op~~ [ROCm] BFloat16 support for ReduceSum, Add, Sub, Mul, Div, Cast op Jan 28, 2022

bf16 for softmax

c83a3c2

ytaous changed the title ~~[ROCm] BFloat16 support for ReduceSum, Add, Sub, Mul, Div, Cast op~~ [ROCm] BFloat16 support Jan 29, 2022

ytaous changed the title ~~[ROCm] BFloat16 support~~ [WIP] [ROCm] BFloat16 support Jan 29, 2022

centwang approved these changes Jan 29, 2022

View reviewed changes

ytaous changed the title ~~[WIP] [ROCm] BFloat16 support~~ [ROCm] BFloat16 support Jan 29, 2022

ytaous merged commit 85cbe83 into master Jan 29, 2022

ytaous deleted the ettao/bf16-2 branch January 29, 2022 06:43

ytaous mentioned this pull request Feb 1, 2022

[ROCm] BFloat16 support #10447

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] BFloat16 support #10416

[ROCm] BFloat16 support #10416

ytaous commented Jan 27, 2022 •

edited

Loading

ytaous commented Jan 27, 2022

centwang Jan 28, 2022 •

edited by ytaous

Loading

ytaous Jan 28, 2022

centwang Jan 28, 2022

ytaous Jan 28, 2022

ytaous commented Jan 28, 2022

ytaous commented Jan 29, 2022

[ROCm] BFloat16 support #10416

[ROCm] BFloat16 support #10416

Conversation

ytaous commented Jan 27, 2022 • edited Loading

ytaous commented Jan 27, 2022

centwang Jan 28, 2022 • edited by ytaous Loading

Choose a reason for hiding this comment

ytaous Jan 28, 2022

Choose a reason for hiding this comment

centwang Jan 28, 2022

Choose a reason for hiding this comment

ytaous Jan 28, 2022

Choose a reason for hiding this comment

ytaous commented Jan 28, 2022

ytaous commented Jan 29, 2022

ytaous commented Jan 27, 2022 •

edited

Loading

centwang Jan 28, 2022 •

edited by ytaous

Loading