-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ninja: build stopped: subcommand failed. #2
Comments
i had the same error and fixed it with pip install ninja |
It looks like OP's build goes further than that since it's already running nvcc. Most likely they have Ninja installed. I don't know if I've hit this exact same problem myself, but my guess is that it's failing to compile due to a too old GCC version (maybe lack/adequate support for constexpr?). This SO post suggests GCC 6.x might have some trouble with constexpr-if expression. See if it works on GCC 7.x? This comment from stylegan2-ada-pytorch issues may be helpful. (Although the comment says GCC min version 6.. which you seem to have.) Cross referencing here anyway in case it helps. BTW: in this release we created troubleshooting doc that we hope to expand as we come across different types of problems with stylegan3. I'm hoping that this will grow into a useful resource for diagnosing and fixing problems with custom op compiles. |
I can confirm that using GCC 8.2 solves the problem. Maybe you should add a minimum required version of the GCC in the readme as well? |
Yes, I will. I just need to first figure out what's the minimum required version. Apparently 6.x is too old. Did you by any chance try any GCC 7.x versions? |
it works with GCC 7.3.0. |
These "on the fly compiled modules" are nothing but trouble and should be replaced/removed, as they prevent easy replication of results. Getting this working requires a (potentially difficult, or time-consuming) setup of C compilers that should not be necessary. Not everyone is lucky enough to have sysadmin control of clusters of V100 GPUs. The code should be architecture-independent. |
@ckyleda - feel free to contribute a dockerfile |
same |
We are aware of the difficulties that arise from the use of PyTorch custom extensions. Just like everyone else, we don't like the problems they bring, but the performance benefits are too great for us to forego these optimizations. In the case of StyleGAN3, our CUDA kernels improve end-to-end training speed by roughly 10x and also reduce the memory footprint very considerably. See Appendix D in our paper for additional details. In our past projects, the custom extension improvements have been less pronounced but with StyleGAN3 the difference in speed and memory footprint is so large that it’s quite impractical to train this model without them. (FWIW: We have also explored creating prebuilt binary wheels for these extensions but AFAICT the extension API is not stable enough between PyTorch releases to make this work, leading to even harder to diagnose problems.) |
you can try update ['ninja', '-v'] to (['ninja', '-V'] or ['ninja', '--version']) |
Thanks for the PR! Look good for me, thank you for the corrections! Let me know if there's anything else you find!
On Windows, I solved this problem by upgrading from Visual Studios 2017 to 2019 |
I am trying to deploy the NVLabs Superpixel Sampling Network (https://github.com/NVlabs/ssn_superpixels) as a Nuclio (serverless) function to the image annotation tool CVAT. Nuclio dockerizes creates a container with my code inside. Building the container works perfectly fine but when initializing my model handler class inside the container, the pytorch "pair_wise_distance" c++ extension leads to the "ninja: build stopped: subcommand failed."-error. To you have any further ideas? I am kind of stuck and went to many different suggetions already. Thanks in advance! My container is based on an "pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime"-image. I am installing following gcc components into my container: Edit: |
I'm getting this error, and my gcc version is 9.4.0 is there any solution? |
Do you have ninja installed? |
Yes i have ninja installed, so i know whats wrong with mine, it looks like i don't know but my cuda dont have runtime_api so thats why the ninja build stopped. I tried reinstalling torch+cuda but it didnt work |
I am facing this issue for a long time(2 days tbh). I tried this but it doesn't work and even if it does, this is not the right answer.
|
gcc-9.5 same,gcc-11.4 pass |
Dear Authors,
I get the following errors when running the code using the stylegan3-t config:
The exact command that I'm using is:
System Details:
Thank you for your help in advance.
The text was updated successfully, but these errors were encountered: