-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm] BFloat16 support #10416
[ROCm] BFloat16 support #10416
Conversation
@iK1D - I found the fix for ReduceSum perf is not in ROCm (reduction_ops.cc) : |
@@ -823,6 +823,111 @@ SPECIALIZED_REDUCEKERNEL_COMPUTEIMPL(int64_t) | |||
SPECIALIZED_REDUCEKERNEL_COMPUTEIMPL(int8_t) | |||
SPECIALIZED_REDUCEKERNEL_COMPUTEIMPL(uint8_t) | |||
|
|||
template <> | |||
template <> | |||
Status ReduceKernel<true>::ComputeImpl<BFloat16, MIOPEN_REDUCE_TENSOR_NO_INDICES>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a new func for BFloat16? Can the default one take the job? Since the default one can handle MLFloat16, and BFloat16 is just similar to MFloat16. #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting lib loading error if I remove the block, will leave it as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the code again. Current BFloat16 reduce will cast data to float for the calculation and cast back again at the end. I think cudnn should already support BFloat16 directly since CUDA11, for better perf we sould go to the default ComputeImpl, but to make it work we need to fix some more places. Let me open a new PR to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, ok, thanks a lot
I'll check with Suffian In reply to: 1023710835 |
@iK1D - let me speed up the BF16 support by applying similar change from your PR. We can work on code clean up + UTs as needed in next phase. |
Description: Enable BFloat16 for ROCm
So far in this PR these are enabled ops
ReduceSum
Binary elementwise ops (Add/Sub/Mul/Div)
Cast
Softmax
Will continue to add support per code refactor from #10085
Motivation and Context