-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime error: element 0 of tensors does not require grad and does not have a grad_fn #39
Comments
Build time output if it helps
|
Ok so HIP VISIBLE DEVICES 0 OpenSplat says it's trying CUDA... VISIBLE 1, 2 show "CPU" So this seemingly is never picking the GPU |
This GPU may be new enough that libtorch/ROCm 5.7 doesnt know how to talk to it properly. I'll start over with 6.x everything and see if it starts working. |
Trying 6.0.2 I get a ton of conflicting cmake
|
So I got it built within the included Docker image and have come to the conclusion this has never even been close to working for me. Whether rocm is actually setup or not I get the same The Docker container doesn't have access to /dev/kfd and still gave me the exact same error. At this point I'm just at a loss getting this to run. |
ROCm support definitely still in need of testing and issues might be present (as the one you've found). We definitely don't want the tensors to end up allocated on the CPU though, the device should match the graphics card. |
@parkerlreed Before launching docker, you may need to expose host GPUs to docker engine first. e.g. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#accessing-gpus-in-containers.
To support ROCm 6.0.2, we have to change the Dockerfile.rocm a little bit. Since the latest stable pytorch version doesn't support it. We have to either wait its next stable release (2.3.0) or use AMD version of build. I probably can add an updated version for your further test. |
Thanks! My initial testing was with the GPU passed through properly from what I can tell, by the correct rocminfo output. The Docker without kfd was just a sanity check realizing that I was getting the same result either way. Happy to test whatever is needed. |
I just realized, is there any point in chasing this? That list is a little bit older, but nothing seems to have changed recently. I've been trying this on an ROG Ally with the Z1 Extreme and the Phoenix GPU. If it can't even run on it anyways, then I guess there's no point. |
It seems like you are using iGPU, which is not supported by ROCm very well. Maybe you can try |
If you have a dedicated GPU, maybe you can disable iGPU via: |
@parkerlreed I created ROCm 6.x based docker build. Feel free to give it a try. Thank you! |
Ubuntu 22.04 container in podman (Fedora host)
ROCm version amdgpu-install_5.7.50703-1_all.de
libtorch version libtorch-cxx11-abi-shared-with-deps-2.2.1+rocm5.7.zip
rocminfo reports CPU at 0 and GPU at 1
Set these variables accordingly
Trying to run the example bananna set I get this (Claims to be using the CPU??)
If I run with HIP devices 0 to go to the CPU I get a completely different error
The text was updated successfully, but these errors were encountered: