Fix bf16 support issues #2238

Summary: - Switch to hip related TARGETS (w/ _hip suffix) when AMD GPU build is used. - Add "supports_python_dlopen = True," to support dlopen on related deps. - Add missing deps like `"//deeplearning/fbgemm/fbgemm_gpu:split_table_batched_embeddings_hip",` Reviewed By: q10, zoranzhao Differential Revision: D52435932

Summary: For bf16 related cuda code, we have the following macro to distinguish between v100 vs. a100 (pre-a100 cuda/NV GPU doesn't support BF16): ``` #if !( \ ((defined(CUDA_VERSION) && CUDA_VERSION < 11000) || \ (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800)))) ``` macro. For AMD GPU (rocm), it will lead to always false. However, on the MI250 / MI300 GPU we have in house, they have BF16 supports. We re-enable BF16 for RoCM related usages. Reviewed By: houseroad, jiawenliu64 Differential Revision: D52438898

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bf16 support issues #2238

Fix bf16 support issues #2238

Commits on Dec 28, 2023