-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InvokeAI AMD fails to build #82
Comments
This comment was marked as outdated.
This comment was marked as outdated.
No, your computer is just too slow to run the tests, you've hit a race condition in the pytorch test suite. |
That's a bit sad, I guess I just have to retry it a few times then. For now it seems to be building, I've got a single |
OK after a few tries, it built and started InvokeAI. Then it failed to download the models for the first time, but after starting fresh, it seems to have worked for the second time. I don't mean to be disrespectful here, but for something that is supposed to be reproducible, this sure took a bunch of trial and error. But in the end it worked, so, yay. |
You may need to understand and learn the difference between upstream issues and issues with this project. The race condition is reproducible, on a slow machine. (issue with pytorch test suite) All this repo does, is give you somebody elses code and allows you to run it the same way twice. It doesn't mean their code is good, or bug free. You do have a misunderstanding. |
Yeah, I understand that the race conditions in the test suite are nothing to do with how this project creates a reproducible environment for third party code. I've dealt with third party packages myself enough times, and honestly I usually just go the lazy route and disable the tests altogether. Again, I'm super happy about all the Nix code here, but I do see some irony in that, I came here for a reproducible environment (after having some issues with ROCm packages elsewhere), and then on the first invocation I ran into flaky tests. I didn't mean to say anything bad about your code here. As for downloading the models, that's also InvokeAI's code, but again, somehow it bailed out the first time, and then succeeded the second time? I'm suspecting it hid some hidden config in my home dir somewhere, so the second time around it just did something differently. Or maybe that also races. Oh well. |
In the end, the web UI starts, but trying to generate any output crashes the server, I'm hitting this issue: AUTOMATIC1111/stable-diffusion-webui#11939 I'm not sure if this is due to my GPU ( |
You are not going to like the amount of issues you will encounter with AMD GPUs. |
You're using Arch, so I can't help you. Whereas if you use NixOS I can give you a few lines of code to correctly define the AMD GPU drivers. |
OK thanks, I'll eventually set up NixOS on this machine. I'll probably just give up on the current system then, until I migrate to NixOS (for that I need to figure out how to port my current cryptsetup+LVM configs). The annoying thing is that I do have a two NVIDIA GPUs as well, but on this machine I opted for an AMD one so that window managers like Sway & Hyprland would load properly, since they don't seem to support the proprietary drivers. Since then I've seen that Nix has patches for Hyprland, so maybe back to NVIDIA I should go, but then what do I do with this otherwise perfectly good GPU. |
I have finally converted my workstation fully to NixOS (with flakes + home manager). I'm going to go ahead and try this once more one of these days and report back. |
Hi @attilaolah Did you got it working? Came here because I got same error. I know the CPU is not powerful, it's a rig used previously for mining, GPU shall be good, there is an AMD Vega 64 and a Nvidia 3090. I installed NixOS and tried to use the AMD GPU, then got the error bellow, next attempt will be with Nvidia, after I figure out how to get the drivers installed in NixOS.
|
No I didn't. But I believe I got an error that is different than yours the last time I tried (on NixOS). I still have the build in cache, so now if I re-run I get the error immediately: $ nix run github:nixified-ai/flake#invokeai-amd
warning: ignoring untrusted substituter 'https://ai.cachix.org', you are not a trusted user.
Run `man nix.conf` for more information on the `substituters` configuration option.
2024-03-16 10:38:36.198394561 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1827 CreateInferencePybindStateModule] Init provider bridge failed.
[2024-03-16 10:38:40,216]::[InvokeAI]::INFO --> Patchmatch initialized
/nix/store/knqd0zgkmj3pajqcmh785qc6m8hjf0hc-python3.11-torchvision-0.15.2/lib/python3.11/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
warnings.warn(
An exception has occurred: /home/ao/invokeai/models/core/convert/CLIP-ViT-bigG-14-laion2B-39B-b160k is missing
== STARTUP ABORTED ==
** One or more necessary files is missing from your InvokeAI root directory **
** Please rerun the configuration script to fix this problem. **
** From the launcher, selection option [7]. **
** From the command line, activate the virtual environment and run "invokeai-configure --yes --skip-sd-weights" **
** (To skip this check completely, add "--ignore_missing_core_models" to your CLI args. Not installing these core models will prevent the loading of some or all .safetensors and .ckpt files. However, you can always come back and install these core models in the future.)
Press any key to continue... Looks like it is complaining about a missing model, although I believe the flake should download all the models? But at least the build itself completes for me. You may want to try to run the build several times, since there were race conditions in the PyTorch tests or somewhere, the last time I tried. EDIT: For now, I'll just try to manually clone the models from HuggingFace into the expected directory to see if this is going to work. |
Even after fetching the required models from HuggingFace, I still get the error (except the line about the missing models). The command suggests re-running invokeai-configure, which I'm not sure how to do. Maybe running |
I'm running into some trouble trying to build the AMD version with Nix 2.18.1 on my Arch Linux host:
Interestingly trying it a few times gives me different errors, although I suppose that's just due to parallel builds racing each other. In fact, after two failures, the third time it seems to be compiling for hours — maybe somehow I ended up with a cache miss there, and it will just take more time to get to the part where it fails?
My
/etc/nix/nix.conf
looks like this (comments stripped):Even though I have sandboxing enabled, I'm not sure whether I should really trust it, so my plan is to go ahead and retry the whole thing inside a Docker container:
So far I haven't gotten there. But it would also be nice to have a basic idea of how long the build should take. I'm running it on a 20 core host with 128G RAM and I'm not doing much else, and so far I'm not even getting a decent progress report, and I'm hesitating to restart it since I don't know if there is a local cache it can pick up from and continue.
The text was updated successfully, but these errors were encountered: