-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD / ROCM support #27
Comments
It should be possible to use AMD to a degree (there is some AMD support in the upstream software), but I do not currently have any suitable AMD hardware to develop on for this project. I plan to offer support at some point in the future though! Bear in mind, ROCM will lag CUDA because Nvidia spent a lot of years and effort capturing the ML market... |
In a fit of enthusiasm, I procured a Radeon W5500X to build and test on. Unfortunately, I did not appreciate that ROCM != CUDA. AMD support almost no cards for compute, especially on the consumer side. They appear to only really be interested in super computer customers. I've burnt over 12 hours trying to hack ROCM to compile and run ML workloads properly - despite some success, I am not happy with the stability, performance, or compatibility. I have decided that there will be no support outside of ROCM's officially supported cards - I will not make up for AMD's woeful support of their own hardware. I am considering purchasing a ROCM supported card but it is now de-prioritised due to the cost - and the additional PSU watts required! Please recognise the excellent work of:
The following resources are useful for ROCM support information
|
Issue was closed accidentally! Please post any AMD thoughts here. My current advice to AMD users is to try out LLM on CPU (with patience) and if you like it, then buy a Nvidia card with as much VRAM as you can afford! The exception is if your card is actually supported by ROCM - then there's a bit of a chance to run models without too much hassle... |
@Atinoda Hello. Thank you for mentioning me. Here list of all cards (GFX codes) that they support (it can be regular RX and professional WX card due the same chips or very similar one - for example they support RX7900 and WX7900) I remember that some RX/WX6000 should works too. |
I have implemented ROCM support in line with the upstream project. At this stage, it is untested because I do not have hardware to run it. Please give it a go and see if it works! Reports are welcomed. |
So I am actually between two of your open issues: RoCM support for an AMD GPU and unRAID container that another person was working on. The good news is that the Docker container works! No issue, only error I saw was a complaint about numpy being 1.22 instead of 1.24 but it didn't slow me down. As for the testing, it seems I cannot get the docker container to see my GPU. In the logs it loads in as CPU Extended which I think might be a fallback. However, to prove or disprove that here are my edits to the docker-compose.yml I did modify line 4 to align with default-rocm as seen above and commented out the Nvidia deployment steps at the bottom. I did also modify the "target" in docker-compose.build.yml as seen below to 'rocm' on line 6. Outside of that I followed an additional guide here to fill out my settings for the unRAID template for launching it with other containers: oobabooga/text-generation-webui#4850 So with all of that taken care of and the chat both opening and loading I noticed per the logs that I was starting in CPU extended mode despite calling for RoCM in the docker-compose. I validated that it was not in use by unRAID in the PCI settings and that the kernel level driver is loading amdgpu the same as my working Arch system. Did I miss something or edit the compose files incorrectly? Unraid: 6.12.18 Edit: One additional thing I wanted to mention is that in the oobabooga's git for text-generation-webui, I did have to edit a file additionally for it to recognize my GPU in the system. I had to modify "one_click.py" which I assume is for initial installation of the system on lines 16-18 filling out my specific GPU information--in this case my 7900 XTX is gfx1100. I would need to look up what the RX6800 in my system is equal to. |
Hi @Alkali-V2 - thanks very much for testing this with your AMD GPU, and for your detailed post! I would be happy to work with you and try to get it up and running on your hardware. I've read through and can I please confirm a couple of things with you?
To get to accelerated inference we need two things: 1) the container using supported libraries, and 2) grant the container access to the GPU hardware. Regarding Step 1, the message that you are seeing for the For Step 2, unfortunately ROCM is a bit more awkward to pass through to docker... I did hack my unsupported GPU to run ROCM workloads (then immediately segfault!) but I think it was correctly available to the container. Please try adding the following - which is the equivalent to the group_add:
- video
ipc: host
devices:
- /dev/kfd
- /dev/dri
cap_add:
- SYS_PTRACE
security_opt:
- seccomp=unconfined Note that there are some heavy duty permissions granted there... so just bear that in mind for the purposes of security. Good luck and please let me know how you get on! |
@Atinoda Thanks for getting back to me and for your work on this project! It greatly simplified this process and I know several Unraid forum users who agree. Edit I am fairly new to docker and didn't realize that compose.yml and compose.build.yml were both dependencies for building. I was following an unraid guide on how to add your own docker images: https://www.reddit.com/r/unRAID/comments/tm2hzn/how_to_install_a_dockerfile_not_using_docker_hub/ -- top voted comment mentioned building so I followed that. Which likely explains exactly why RoCM didn't compile correctly: my compose.build.yaml didn't say 'default-rocm' as you suggested. However, one note for anyone who may also be trying this at home: the command 'docker compose up' will fail for compose being an invalid command due to Unraid not having the compose plugin installed by default. But there is a Docker Compose plugin and it is located here: https://forums.unraid.net/topic/114415-plugin-docker-compose-manager/page/19/ There were some concerns in the latest pages about its update recently so I am using build again for this test. To answer your questions:
I rebuilt just now with the changes to my docker-compose.build.yaml and I am still seeing CPU extended in the logs. Did I miss anything additional? |
Thank you for answering my questions - that's especially good news that you've had acceleration running before. I am not familiar with Unraid, but a conversation with an LLM tells me that it should be possible to pull rather than build images. I believe that the behaviour you are seeing with However, if you want to build it, I think that you need to modify As for the ROCM libraries - we'll need to see what is required to actually accelerate, the installed pytorch might be enough. Regarding the contents of |
That's really great news - thank you for working to get it running and for sharing your results! Those are good speeds and it's cool that it's via Unraid - two questions answered in one deployment. Last suggestion for day-to-day operations - if you are using the software server-style, you might want to consider adding a version tag to the image (e.g., |
is there a way to run with AMD GPU ?
The text was updated successfully, but these errors were encountered: