Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD GPU support implemented with HIP - Seeking to merge for dual NVIDIA/AMD compatibility #111

Open
kerwenwwer opened this issue Sep 1, 2024 · 2 comments

Comments

@kerwenwwer
Copy link

Hello,

I've created an AMD-compatible fork of gpu-burn using HIP (Heterogeneous-Compute Interface for Portability). This version supports both NVIDIA and AMD GPUs, expanding the tool's utility across different hardware platforms.

Key features of the AMD-compatible version:

  • Supports both NVIDIA and AMD GPUs
  • Uses HIP for cross-platform compatibility
  • Maintains the core functionality of the original gpu-burn
  • Tested on [list specific AMD and NVIDIA GPUs you've tested on]

You can find the AMD-compatible version here: https://github.com/kerwenwwer/amd-gpu-burn

I'm interested in discussing the possibility of merging these changes back into the main repository to provide official support for both NVIDIA and AMD GPUs. I think it's best for the community 😁

@wilicc
Copy link
Owner

wilicc commented Sep 6, 2024

Hi,

This would be very welcome indeed! The reason to use cublas is 2-fold:

  1. At the time I originally wrote this, there weren't any optimized/tuned blas routines that worked on both AMD and NVidia. As you showed, this has now changed.
  2. It is not easy to stress GPU "to the max". Cublas is known to be very efficient (and stressful) on NVidia cards. Not many compute loads manage to stress the HW as efficiently.

Now my question is: Do you happen to know whether the hipblas is as efficient on NVidia as cublas is? One way would be to simply benchmark. If it is not, we might have to maintain 2 codepaths: hipblas for AMD and cublas for NVidia. It would of course be much cleaner if the same implementation was optimal for both vendors.

@kerwenwwer
Copy link
Author

Thank you for your reply.
I believe that hipblas is quite similar to CUDA in terms of stress pressure on GPU cards. What I've done so far is simply port your benchmark flow from CUDA API to HIP API. The main changes I made were:

  1. Modifying the temperature display method (from nvidia-smi to rocm-smi)
  2. Updating deprecated APIs. For example, we no longer need to set cuParamSetSize before launching a kernel function; instead, we use cuLaunchKernel.

The results of my current tests using hipblas with the same algorithm show that it can easily fully utilize the entire GPU on AMD MI210. So I think that we use hipblas and cublas simultaneously is not the problem (the way to write the code is basically the same, the only difference is the API name).

However, there are some considerations for merging the code base:

  1. The current code base integrates control and compute functions in the same .c file.
  2. The compiler used on the AMD ROCm platform is based on Clang.

Given these factors, if you want to merge the code bases, you may need to refactor gpu-burn. A better solution might involve separating the control and compute functions and using a build system that can accommodate both CUDA and HIP compilations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants