-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bf16 support issues #2238
Fix bf16 support issues #2238
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This pull request was exported from Phabricator. Differential Revision: D52438898 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-export?
Summary: - Switch to hip related TARGETS (w/ _hip suffix) when AMD GPU build is used. - Add "supports_python_dlopen = True," to support dlopen on related deps. - Add missing deps like `"//deeplearning/fbgemm/fbgemm_gpu:split_table_batched_embeddings_hip",` Reviewed By: q10, zoranzhao Differential Revision: D52435932
Summary: For bf16 related cuda code, we have the following macro to distinguish between v100 vs. a100 (pre-a100 cuda/NV GPU doesn't support BF16): ``` #if !( \ ((defined(CUDA_VERSION) && CUDA_VERSION < 11000) || \ (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800)))) ``` macro. For AMD GPU (rocm), it will lead to always false. However, on the MI250 / MI300 GPU we have in house, they have BF16 supports. We re-enable BF16 for RoCM related usages. Reviewed By: houseroad, jiawenliu64 Differential Revision: D52438898
cb7153d
to
c689c7b
Compare
Summary: For bf16 related cuda code, we have the following macro to distinguish between v100 vs. a100 (pre-a100 cuda/NV GPU doesn't support BF16): ``` #if !( \ ((defined(CUDA_VERSION) && CUDA_VERSION < 11000) || \ (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800)))) ``` macro. For AMD GPU (rocm), it will lead to always false. However, on the MI250 / MI300 GPU we have in house, they have BF16 supports. We re-enable BF16 for RoCM related usages. Reviewed By: houseroad, jiawenliu64 Differential Revision: D52438898
This pull request was exported from Phabricator. Differential Revision: D52438898 |
c689c7b
to
90ecc97
Compare
Summary: For bf16 related cuda code, we have the following macro to distinguish between v100 vs. a100 (pre-a100 cuda/NV GPU doesn't support BF16): ``` #if !( \ ((defined(CUDA_VERSION) && CUDA_VERSION < 11000) || \ (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800)))) ``` macro. For AMD GPU (rocm), it will lead to always false. However, on the MI250 / MI300 GPU we have in house, they have BF16 supports. We re-enable BF16 for RoCM related usages. Reviewed By: houseroad, jiawenliu64 Differential Revision: D52438898
This pull request was exported from Phabricator. Differential Revision: D52438898 |
90ecc97
to
0f82766
Compare
This pull request was exported from Phabricator. Differential Revision: D52438898 |
This pull request has been merged in 9cd944a. |
Summary:
For bf16 related cuda code, we have the following macro to distinguish between v100 vs. a100 (pre-a100 cuda/NV GPU doesn't support BF16):
macro.
For AMD GPU (rocm), it will lead to always false. However, on the MI250 / MI300 GPU we have in house, they have BF16 supports. We re-enable BF16 for RoCM related usages.
Reviewed By: jiawenliu64
Differential Revision: D52438898