Add ROCm support #993

CSY-ModelCloud · 2025-01-02T12:29:17Z

No description provided.

Qubitium · 2025-01-02T12:50:37Z

Checklist

Validate all kernels. Find out which works and which does not.

Marlin is not supported, exllama v1/v2 are supported

Detect fake cuda (rocm) from real cuda:0 device.

check torch.version.hip is not None

What happens if both real and fake cuda exists? Both 4090 and 7900 are active at the same time? Is this possible?

torch rocm can't visit nvidia devices and torch cuda version can't visit amd

Modify SUPPORTS_DEVICE property for kernels to handle ROCm compat
Setup should skip download for nvidia/cuda precompiled whl

CSY-ModelCloud · 2025-01-03T01:57:01Z

https://rocm.docs.amd.com/projects/HIPIFY/en/docs-6.1.0/tables/CUDA_Device_API_supported_by_HIP.html

nv_bfloat16 has no implementation in rocm

marlin kernel

      In file included from gptqmodel_ext/marlin/marlin_hip_kernel.hip:14:
      gptqmodel_ext/marlin/marlin_hip.cuh:60:41: error: use of undeclared identifier '__cvta_generic_to_shared'
         60 |   uint32_t smem = static_cast<uint32_t>(__cvta_generic_to_shared(smem_ptr));
            |                                         ^
      gptqmodel_ext/marlin/marlin_hip.cuh:67:18: error: invalid input constraint 'l' in asm
         67 |       "r"(smem), "l"(glob_ptr), "n"(BYTES));
            |                  ^
      gptqmodel_ext/marlin/marlin_hip.cuh:72:41: error: use of undeclared identifier '__cvta_generic_to_shared'
         72 |   uint32_t smem = static_cast<uint32_t>(__cvta_generic_to_shared(smem_ptr));
            |                                         ^
      gptqmodel_ext/marlin/marlin_hip.cuh:77:7: error: invalid input constraint 'l' in asm
         77 |       "l"(glob_ptr), "n"(BYTES));
            |       ^
      In file included from gptqmodel_ext/marlin/marlin_hip_kernel.hip:15:
      gptqmodel_ext/marlin/marlin_dtypes_hip.cuh:49:18: error: unknown type name 'nv_bfloat16'; did you mean 'hip_bfloat16'?
         49 | class ScalarType<nv_bfloat16> {
            |                  ^~~~~~~~~~~
            |                  hip_bfloat16
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      In file included from gptqmodel_ext/marlin/marlin_hip_kernel.hip:15:
      gptqmodel_ext/marlin/marlin_dtypes_hip.cuh:51:20: error: unknown type name 'nv_bfloat16'; did you mean 'hip_bfloat16'?
         51 |   using scalar_t = nv_bfloat16;
            |                    ^~~~~~~~~~~
            |                    hip_bfloat16
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      In file included from gptqmodel_ext/marlin/marlin_hip_kernel.hip:15:
      gptqmodel_ext/marlin/marlin_dtypes_hip.cuh:52:21: error: unknown type name 'nv_bfloat162'; did you mean 'hip_bfloat16'?
         52 |   using scalar_t2 = nv_bfloat162;
            |                     ^~~~~~~~~~~~
            |                     hip_bfloat16
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      In file included from gptqmodel_ext/marlin/marlin_hip_kernel.hip:15:
      gptqmodel_ext/marlin/marlin_dtypes_hip.cuh:54:21: error: unknown type name 'nv_bfloat162'; did you mean 'hip_bfloat16'?
         54 |   using FragA = Vec<nv_bfloat162, 4>;
            |                     ^~~~~~~~~~~~
            |                     hip_bfloat16
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      In file included from gptqmodel_ext/marlin/marlin_hip_kernel.hip:15:
      gptqmodel_ext/marlin/marlin_dtypes_hip.cuh:55:21: error: unknown type name 'nv_bfloat162'; did you mean 'hip_bfloat16'?
         55 |   using FragB = Vec<nv_bfloat162, 2>;
            |                     ^~~~~~~~~~~~
            |                     hip_bfloat16
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      In file included from gptqmodel_ext/marlin/marlin_hip_kernel.hip:15:
      gptqmodel_ext/marlin/marlin_dtypes_hip.cuh:57:21: error: unknown type name 'nv_bfloat162'; did you mean 'hip_bfloat16'?
         57 |   using FragS = Vec<nv_bfloat162, 1>;
            |                     ^~~~~~~~~~~~
            |                     hip_bfloat16
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      In file included from gptqmodel_ext/marlin/marlin_hip_kernel.hip:15:
      gptqmodel_ext/marlin/marlin_dtypes_hip.cuh:58:22: error: unknown type name 'nv_bfloat162'; did you mean 'hip_bfloat16'?
         58 |   using FragZP = Vec<nv_bfloat162, 4>;
            |                      ^~~~~~~~~~~~
            |                      hip_bfloat16
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      gptqmodel_ext/marlin/marlin_hip_kernel.hip:92:11: error: invalid output constraint '=f' in asm
         92 |         : "=f"(c[0]), "=f"(c[1]), "=f"(c[2]), "=f"(c[3])
            |           ^
      gptqmodel_ext/marlin/marlin_hip_kernel.hip:95:47: error: unknown type name 'nv_bfloat16'; did you mean 'hip_bfloat16'?
         95 |   } else if constexpr (std::is_same<scalar_t, nv_bfloat16>::value) {
            |                                               ^~~~~~~~~~~
            |                                               hip_bfloat16
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      gptqmodel_ext/marlin/marlin_hip_kernel.hip:99:11: error: invalid output constraint '=f' in asm
         99 |         : "=f"(c[0]), "=f"(c[1]), "=f"(c[2]), "=f"(c[3])
            |           ^
      gptqmodel_ext/marlin/marlin_hip_kernel.hip:103:5: error: unknown type name 'nv_bfloat16'; did you mean 'hip_bfloat16'?
        103 |     STATIC_ASSERT_SCALAR_TYPE_VALID(scalar_t);
            |     ^
      gptqmodel_ext/marlin/marlin_hip_kernel.hip:19:44: note: expanded from macro 'STATIC_ASSERT_SCALAR_TYPE_VALID'
         19 |                     std::is_same<scalar_t, nv_bfloat16>::value, \
            |                                            ^
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      gptqmodel_ext/marlin/marlin_hip_kernel.hip:113:41: error: use of undeclared identifier '__cvta_generic_to_shared'
        113 |   uint32_t smem = static_cast<uint32_t>(__cvta_generic_to_shared(smem_ptr));
            |                                         ^
      gptqmodel_ext/marlin/marlin_hip_kernel.hip:151:3: error: unknown type name 'nv_bfloat16'; did you mean 'hip_bfloat16'?
        151 |   STATIC_ASSERT_SCALAR_TYPE_VALID(scalar_t);
            |   ^
      gptqmodel_ext/marlin/marlin_hip_kernel.hip:19:44: note: expanded from macro 'STATIC_ASSERT_SCALAR_TYPE_VALID'
         19 |                     std::is_same<scalar_t, nv_bfloat16>::value, \
            |                                            ^
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      gptqmodel_ext/marlin/marlin_hip_kernel.hip:177:39: error: unknown type name 'nv_bfloat16'; did you mean 'hip_bfloat16'?
        177 | __device__ inline typename ScalarType<nv_bfloat16>::FragB
            |                                       ^~~~~~~~~~~
            |                                       hip_bfloat16
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      gptqmodel_ext/marlin/marlin_hip_kernel.hip:178:14: error: unknown type name 'nv_bfloat16'; did you mean 'hip_bfloat16'?
        178 | dequant_4bit<nv_bfloat16>(int q) {
            |              ^~~~~~~~~~~
            |              hip_bfloat16
      /opt/rocm-6.3.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bfloat16.h:57:8: note: 'hip_bfloat16' declared here
         57 | struct hip_bfloat16
            |        ^
      fatal error: too many errors emitted, stopping now [-ferror-limit=]
      20 errors generated when compiling for gfx1030.
      failed to execute:/opt/rocm-6.3.1/lib/llvm/bin/clang++  --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx942  -I/home/work/GPTQModel/venv/lib/python3.12/site-packages/torch/include -I/home/work/GPTQModel/venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/home/work/GPTQModel/venv/lib/python3.12/site-packages/torch/include/TH -I/home/work/GPTQModel/venv/lib/python3.12/site-packages/torch/include/THC -I/home/work/GPTQModel/venv/lib/python3.12/site-packages/torch/include/THH -I/opt/rocm-6.3.1/include -Igptqmodel_cuda -I/home/work/GPTQModel/venv/lib/python3.12/site-packages/nvidia/cuda_runtime/include -I/home/work/GPTQModel/venv/include -I/usr/include/python3.12 -c -x hip gptqmodel_ext/marlin/marlin_hip_kernel.hip -o "build/temp.linux-x86_64-cpython-312/gptqmodel_ext/marlin/marlin_hip_kernel.o" -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -diag-suppress=179,39,186 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=gptqmodel_marlin_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
      error: command '/opt/rocm-6.3.1/bin/hipcc' failed with exit code 1

Qubitium · 2025-01-03T05:08:27Z

AMD is genius, NOT!, for not having a rocm device.

CSY-ModelCloud added 2 commits January 2, 2025 12:27

add rocm support

380e694

skil download whl if ROCM_VERSION is defined

fcb3eb4

Qubitium changed the title ~~add rocm support~~ Add ROCm support Jan 2, 2025

CSY-ModelCloud and others added 12 commits January 3, 2025 02:03

exclude marlin for rocm

88ae81e

add comments

a3f7f2c

fix rocm version tag

d960df7

fix rocm version tag

8ea6a44

add rocm device

46a27fd

add support device

46a59a5

set ROCM device value to cuda

3a0c054

Update importer.py

d4a277d

Update setup.py

bdd42ac

Update _const.py

7841180

Update _const.py

6894882

Update setup.py

d7239e8

Qubitium marked this pull request as ready for review January 3, 2025 05:08

Qubitium and others added 5 commits January 3, 2025 13:12

Update _const.py

559d7ba

add to device map for DEVICE

3c39e5c

add warning for rocm < 6.2

248a5a1

add thread support for rocm

6aee251

remove pthread for rocm

4126bc4

Qubitium merged commit 0b6924e into main Jan 3, 2025
4 checks passed

CSY-ModelCloud deleted the CSY/rocm branch January 4, 2025 02:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ROCm support #993

Add ROCm support #993

CSY-ModelCloud commented Jan 2, 2025

Qubitium commented Jan 2, 2025 •

edited by CSY-ModelCloud

Loading

CSY-ModelCloud commented Jan 3, 2025 •

edited

Loading

Qubitium commented Jan 3, 2025

Add ROCm support #993

Add ROCm support #993

Conversation

CSY-ModelCloud commented Jan 2, 2025

Qubitium commented Jan 2, 2025 • edited by CSY-ModelCloud Loading

CSY-ModelCloud commented Jan 3, 2025 • edited Loading

https://rocm.docs.amd.com/projects/HIPIFY/en/docs-6.1.0/tables/CUDA_Device_API_supported_by_HIP.html

nv_bfloat16 has no implementation in rocm

Qubitium commented Jan 3, 2025

Qubitium commented Jan 2, 2025 •

edited by CSY-ModelCloud

Loading

CSY-ModelCloud commented Jan 3, 2025 •

edited

Loading