-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux) #295
Comments
I too would like to train with RTX 4090. I'd be interested in whether or not you were able to figure out a workaround. I'd be buying the 4090 specifically for this purpose, quite an investment if it doesn't work. If the RTX 4090 can't work, what's best GPU to get for training with Piper? |
Hi thank you very much for your great work ! here how i make to work with RTX 4090 and wls2 i use win 10 install developer python
Then create a Python virtual environment and activated:
update pip and wheel setuptools
install pytorch this version change in requirements.txt
run
build build_monotonic_align
i hope this help |
This is great lpscr! I don't see anything there that's specific to running in a WSL environment, so it should work on an Ubuntu system. I'll go ahead and get an RTX 4090 and see if I can replicate what you've detailed above. Wish me luck! |
@aaronnewsome I can confirm @lpscr's workaround is succesfully woring on a 4090! Missing part for me was pytorch-lightning~=1.9.0 Never the less pre processing has some problems with that, but you can use the official installation method in another venv for that and be fine. Edit: |
60 epochs per minute!! I'm getting about 20-30 epochs per HOUR with quality high, 1150 voice samples, using an RTX 3060 - 6GB. CPU only performance is not even worth mentioning, a waste of electricity if you ask me. I'm placing my order for 4090 today! |
Writing every checkpoint epoch to disk is a bottleneck i think. |
I just for fun tried the thorsten-voice dataset with 22672 voice samples: |
I really appreciate you adding more context around the performance of the 4090 ei23fxg. Many, many thanks. It makes me think there should be some kind of effort started to benchmark and catalog performance so that new users like me can understand what we're getting into with all this. It could also be a great place for curious users to see which setups work, what kind of tweaks need to be done and so on. I'm really appreciative of this project and I find it just simply amazing. I'm rather impressed at myself for having the patience to actually get a training done, since I'm not an expert in any of these concepts. I feel like I've stumbled upon it way too early since it hasn't quite progressed to the "anyone can do it" stage. I'd be willing to help organizing some kind of a benchmarking standard test. If everyone benchmarks the same samples, with the same software versions and settings, it could be very useful to collect those stats and make them browseable. |
Thanks for @lpscr. |
hi everyone @qt06 @aaronnewsome @ei23fxg , happy i help on this :) @ei23fxg thank you for the tip for speed up i think be good idea somewhere put this in https://github.com/rhasspy/piper/blob/master/TRAINING.md in train happy train to all ;) |
If you don't mind editing the code and don't want to change versions for whatever reason, you can simply modify the device for that portion of the model to run on the CPU, then push the tensors back to the GPU. There's probably a bit of overhead, but likely still much faster. In particular, in
|
1Is there a way to fix training not being possible on win11 rtx 4050 laptop GPU? I have been trying to train locally for a week now. and it never worked. I get an NRvtc error, or this: Traceback (most recent call last): nvrtc compilation failed: #define NAN __int_as_float(0x7fffffff) template template extern "C" global |
@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia) |
@ei23fxg I do, but i don't know how to do that (select gpu) |
I got it to work finally after help from @Graylington the legend. |
could you please tell how to solve the problem.I have met the same issue with win11 rtx 4050 laptop GPU. |
this is from Graylington, I cannot remember how to do it exactly in detail, but I troubleshooted and it worked( I do not remember what I did exactly, but just follow this: (Graylington original message) "here how i make to work with RTX 4090 and wls2 i use win 10 install developer python sudo apt-get install python3-dev Then create a Python virtual environment and activated: cd piper/src/python pip3 install --upgrade pip change in requirements.txt cython>=0.29.0,<1 pip3 install -e . build build_monotonic_align chmod +x build_monotonic_align.sh This works great on my 4090! Problem is, I can no longer run inference." |
Changing the text requirements was probably the most crucial step I think, but I don't remember |
thanks,I have finished the issue!!!!!!!!!!!! |
How you did it? |
Thanks to you Guys, I think I may be almost there. My new issue: lightning_fabric/utilities/types.py", line 36, in Any help? [UPDATE] The issue was related to the torch version I installed. I reinstalled everything and it looks like its working. Thank you Guys! |
Thank you lpscr! |
I ran into this with my RTX 4060 Ti as well (on Ubuntu 24.04). I struggled getting @lpscr's solution to work, my mistake was that I wasn't following them properly. I only updated the pytorch-lightning version in requirements.txt (as it is different). What I should have done was replace the entire file with the contents exactly as pasted. Finally I am no longer getting the "RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR" error anymore! Thank you so much @lpscr! |
I encountered some problems with training, most of which I could resolve, as I will describe here.
I tried it on WSL2 (Ubuntu-20.04) and a 'real' Linux Ubuntu-22.04LTS.
The WSL2 guide works well on Linux, also on WSL2, of course, with these additions:
You have to change torchmetrics like this:
pip install torchmetrics==0.11.4
as Thorsten already mentioned in his video guide - Thanks Thorsten!
On WSL2, you may also encounter this error:
"Error: WSL2 Could not load the library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory,"
which can be solved like this:
Also mentioned here github.com/microsoft/WSL/issues/5663
On my old system with a GTX1060 this is already working on GPU (on WSL2 and also native Ubuntu-22.04LTS)
On the new system, I only get CPU to work. And of course the GTX1060 still beats a i9-14900k...
With the RTX 4090 it is like this (Same on WSL2 and Ubuntu.22.04LTS):
I did some research, and it seems this issue is caused by a bug in cuda-11.7, as mentioned here github.com/pytorch/pytorch/issues/88038. I also tried the nvidia/pytorch:22.03-py3 docker image, but that also has some support issues with the 4090?!
My question:
Are there any workarounds to get an RTX 4090 running or any plans to upgrade to Torch >=2?
It's a pity that I can't use it for training...
And also thanks for the great work!
The text was updated successfully, but these errors were encountered: