Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] BFloat16 support #10416

Merged
merged 6 commits into from
Jan 29, 2022
Merged

[ROCm] BFloat16 support #10416

merged 6 commits into from
Jan 29, 2022

Conversation

ytaous
Copy link
Contributor

@ytaous ytaous commented Jan 27, 2022

Description: Enable BFloat16 for ROCm
So far in this PR these are enabled ops

ReduceSum
Binary elementwise ops (Add/Sub/Mul/Div)
Cast
Softmax

Will continue to add support per code refactor from #10085

Motivation and Context

  • Why is this change required? What problem does it solve?
  • If it fixes an open issue, please link to the issue here.

@ytaous ytaous added the training issues related to ONNX Runtime training; typically submitted using template label Jan 27, 2022
@ytaous
Copy link
Contributor Author

ytaous commented Jan 27, 2022

@iK1D - I found the fix for ReduceSum perf is not in ROCm (reduction_ops.cc) :
#9471
I wonder if it's applicable and I could apply the code change manually either in this PR or separate one.

@ytaous ytaous changed the title [ROCm] BFloat16 support for ReduceSum op [ROCm] BFloat16 support for ReduceSum, Add, Sub, Mul, Div op Jan 28, 2022
@@ -823,6 +823,111 @@ SPECIALIZED_REDUCEKERNEL_COMPUTEIMPL(int64_t)
SPECIALIZED_REDUCEKERNEL_COMPUTEIMPL(int8_t)
SPECIALIZED_REDUCEKERNEL_COMPUTEIMPL(uint8_t)

template <>
template <>
Status ReduceKernel<true>::ComputeImpl<BFloat16, MIOPEN_REDUCE_TENSOR_NO_INDICES>(
Copy link
Contributor

@centwang centwang Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a new func for BFloat16? Can the default one take the job? Since the default one can handle MLFloat16, and BFloat16 is just similar to MFloat16. #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting lib loading error if I remove the block, will leave it as is for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code again. Current BFloat16 reduce will cast data to float for the calculation and cast back again at the end. I think cudnn should already support BFloat16 directly since CUDA11, for better perf we sould go to the default ComputeImpl, but to make it work we need to fix some more places. Let me open a new PR to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, ok, thanks a lot

@ytaous
Copy link
Contributor Author

ytaous commented Jan 28, 2022

I'll check with Suffian


In reply to: 1023710835

@ytaous ytaous changed the title [ROCm] BFloat16 support for ReduceSum, Add, Sub, Mul, Div op [ROCm] BFloat16 support for ReduceSum, Add, Sub, Mul, Div, Cast op Jan 28, 2022
@ytaous ytaous changed the title [ROCm] BFloat16 support for ReduceSum, Add, Sub, Mul, Div, Cast op [ROCm] BFloat16 support Jan 29, 2022
@ytaous ytaous changed the title [ROCm] BFloat16 support [WIP] [ROCm] BFloat16 support Jan 29, 2022
@ytaous
Copy link
Contributor Author

ytaous commented Jan 29, 2022

@iK1D - let me speed up the BF16 support by applying similar change from your PR.
#10085

We can work on code clean up + UTs as needed in next phase.

@ytaous ytaous changed the title [WIP] [ROCm] BFloat16 support [ROCm] BFloat16 support Jan 29, 2022
@ytaous ytaous merged commit 85cbe83 into master Jan 29, 2022
@ytaous ytaous deleted the ettao/bf16-2 branch January 29, 2022 06:43
@ytaous ytaous mentioned this pull request Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training issues related to ONNX Runtime training; typically submitted using template
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants