No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux) #295

ei23fxg · 2023-12-03T17:28:23Z

I encountered some problems with training, most of which I could resolve, as I will describe here.
I tried it on WSL2 (Ubuntu-20.04) and a 'real' Linux Ubuntu-22.04LTS.

The WSL2 guide works well on Linux, also on WSL2, of course, with these additions:

You have to change torchmetrics like this:
pip install torchmetrics==0.11.4
as Thorsten already mentioned in his video guide - Thanks Thorsten!

On WSL2, you may also encounter this error:
"Error: WSL2 Could not load the library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory,"
which can be solved like this:

sudo ldconfig
cd /usr/lib/wsl/lib/
sudo mv libcuda.so.1 libcuda.so.1.backup
sudo mv libcuda.so libcuda.so.backup
sudo ln -s libcuda.so.1.1 libcuda.so.1
sudo ln -s libcuda.so.1.1 libcuda.so
sudo ldconfig

Also mentioned here github.com/microsoft/WSL/issues/5663

On my old system with a GTX1060 this is already working on GPU (on WSL2 and also native Ubuntu-22.04LTS)
On the new system, I only get CPU to work. And of course the GTX1060 still beats a i9-14900k...

With the RTX 4090 it is like this (Same on WSL2 and Ubuntu.22.04LTS):

(.venv) user@ubuntu:~/piper/src/python$ python3 -m piper_train --dataset-dir ~/piper/my-training --accelerator 'gpu' --devices 1 --batch-size 32 --validation-split 0.0 --num-test-examples 0 --max_epochs 10000 --resume_from_checkpoint ~/piper/epoch=2665-step=1182078.ckpt --checkpoint-epochs 1 --precision 32 --quality high
DEBUG:piper_train:Namespace(dataset_dir='/home/user/piper/my-training', checkpoint_epochs=1, quality='high', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=10000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=50, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='/home/user/piper/epoch=2665-step=1182078.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=32, validation_split=0.0, num_test_examples=0, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234)
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting `Trainer(resume_from_checkpoint=)` is deprecated in v1.5 and will be removed in v1.7. Please pass `Trainer.fit(ckpt_path=)` directly instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 1 epoch(s)
DEBUG:vits.dataset:Loading dataset: /home/user/piper/my-training/dataset.jsonl
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning: `trainer.resume_from_checkpoint` is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with `trainer.fit(ckpt_path=)` instead.
  ckpt_path = ckpt_path or self.resume_from_checkpoint
Missing logger folder: /home/user/piper/my-training/lightning_logs
Restoring states from the checkpoint path at /home/user/piper/epoch=2665-step=1182078.ckpt
DEBUG:fsspec.local:open file: /home/user/piper/epoch=2665-step=1182078.ckpt
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:345: UserWarning: The dirpath has changed from '/ssd/piper/out-train/lightning_logs/version_1/checkpoints' to '/home/user/piper/my-training/lightning_logs/version_0/checkpoints', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded.
  warnings.warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /home/user/piper/my-training/lightning_logs/version_0/hparams.yaml
Restored all states from the checkpoint file at /home/user/piper/epoch=2665-step=1182078.ckpt
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:153: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
  rank_zero_warn(
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:236: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 32 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (6) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/user/piper/src/python/piper_train/__main__.py", line 147, in <module>
    main()
  File "/home/user/piper/src/python/piper_train/__main__.py", line 124, in main
    trainer.fit(model)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
    results = self._run_stage()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
    return self._run_train()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
    self.fit_loop.run()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1550, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1705, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 140, in wrapper
    out = func(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/adamw.py", line 120, in step
    loss = closure()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure
    closure_result = closure()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 132, in closure
    step_output = self._step_fn()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 407, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 358, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 191, in training_step
    return self.training_step_g(batch)
  File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 230, in training_step_g
    y_hat_mel = mel_spectrogram_torch(
  File "/home/user/piper/src/python/piper_train/vits/mel_processing.py", line 120, in mel_spectrogram_torch
    torch.stft(
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/functional.py", line 632, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

I did some research, and it seems this issue is caused by a bug in cuda-11.7, as mentioned here github.com/pytorch/pytorch/issues/88038. I also tried the nvidia/pytorch:22.03-py3 docker image, but that also has some support issues with the 4090?!

My question:
Are there any workarounds to get an RTX 4090 running or any plans to upgrade to Torch >=2?
It's a pity that I can't use it for training...

And also thanks for the great work!

The text was updated successfully, but these errors were encountered:

aaronnewsome · 2023-12-04T21:34:20Z

I too would like to train with RTX 4090. I'd be interested in whether or not you were able to figure out a workaround. I'd be buying the 4090 specifically for this purpose, quite an investment if it doesn't work. If the RTX 4090 can't work, what's best GPU to get for training with Piper?

lpscr · 2023-12-04T23:24:16Z

Hi thank you very much for your great work !

here how i make to work with RTX 4090 and wls2 i use win 10

install developer python

sudo apt-get install python3-dev

Then create a Python virtual environment and activated:

cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate

update pip and wheel setuptools

pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools

install pytorch this version
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

change in requirements.txt

cython>=0.29.0,<1
librosa>=0.9.2,<1
piper-phonemize~=1.1.0
numpy>=1.19.0
onnxruntime>=1.11.0
pytorch-lightning~=1.9.0
onnx

run

pip3 install -e .

build build_monotonic_align

chmod +x build_monotonic_align.sh
build_monotonic_align.sh

i hope this help

aaronnewsome · 2023-12-04T23:40:45Z

This is great lpscr! I don't see anything there that's specific to running in a WSL environment, so it should work on an Ubuntu system. I'll go ahead and get an RTX 4090 and see if I can replicate what you've detailed above. Wish me luck!

ei23fxg · 2023-12-05T11:36:09Z

@aaronnewsome I can confirm @lpscr's workaround is succesfully woring on a 4090!
And soo much faster than on GTX1060!!

Missing part for me was pytorch-lightning~=1.9.0
Thanks @lpscr

Never the less pre processing has some problems with that, but you can use the official installation method in another venv for that and be fine.

Edit:
Set checkpoint_epochs=50 or 100
Will create less backups but training will be a lot faster
I get ~1 epoch per sec (60 epochs per minute)!

aaronnewsome · 2023-12-05T15:48:22Z

60 epochs per minute!!

I'm getting about 20-30 epochs per HOUR with quality high, 1150 voice samples, using an RTX 3060 - 6GB. CPU only performance is not even worth mentioning, a waste of electricity if you ask me.

I'm placing my order for 4090 today!

ei23fxg · 2023-12-05T23:00:47Z

60 epochs per minute!!

I'm getting about 20-30 epochs per HOUR with quality high, 1150 voice samples, using an RTX 3060 - 6GB. CPU only performance is not even worth mentioning, a waste of electricity if you ask me.

I'm placing my order for 4090 today!

Writing every checkpoint epoch to disk is a bottleneck i think.
My test was with 500 voice samples also on high quality, will record more samples and post results later.
Never the less, the 4090 is a beast and totaly worth it.
Got mine last month for around 1800€, but prices are rising currently

ei23fxg · 2023-12-06T00:31:10Z

I just for fun tried the thorsten-voice dataset with 22672 voice samples:
Preprocessing took quite some time (~20min) but this has to run on CPU.
Results on main training with GPU looks a bit different here:
one epoch ~2 minutes on RTX4090 (batch size 64) - but 22672 voice samples man... 1150 should be fast

aaronnewsome · 2023-12-06T01:03:19Z

I really appreciate you adding more context around the performance of the 4090 ei23fxg. Many, many thanks.

It makes me think there should be some kind of effort started to benchmark and catalog performance so that new users like me can understand what we're getting into with all this.

It could also be a great place for curious users to see which setups work, what kind of tweaks need to be done and so on. I'm really appreciative of this project and I find it just simply amazing. I'm rather impressed at myself for having the patience to actually get a training done, since I'm not an expert in any of these concepts. I feel like I've stumbled upon it way too early since it hasn't quite progressed to the "anyone can do it" stage.

I'd be willing to help organizing some kind of a benchmarking standard test. If everyone benchmarks the same samples, with the same software versions and settings, it could be very useful to collect those stats and make them browseable.

qt06 · 2023-12-06T01:59:36Z

Thanks for @lpscr.
follow your work , I can train on 4060Ti.

lpscr · 2023-12-06T21:28:18Z

hi everyone @qt06 @aaronnewsome @ei23fxg , happy i help on this :)
like say @ei23fxg need change the version for
pytorch-lightning~=1.9.0

@ei23fxg thank you for the tip for speed up

i think be good idea somewhere put this in https://github.com/rhasspy/piper/blob/master/TRAINING.md in train
because the rtx 4090 very powerful gpu card and it's sad you cant use it ,with this amazing repo

happy train to all ;)

mitchelldehaven · 2023-12-08T17:50:01Z

If you don't mind editing the code and don't want to change versions for whatever reason, you can simply modify the device for that portion of the model to run on the CPU, then push the tensors back to the GPU. There's probably a bit of overhead, but likely still much faster.

In particular, in src/piper_train/vits/lightning.py:

        y_hat_mel = mel_spectrogram_torch(
            y_hat.squeeze(1).to("cpu"),
            self.hparams.filter_length,
            self.hparams.mel_channels,
            self.hparams.sample_rate,
            self.hparams.hop_length,
            self.hparams.win_length,
            self.hparams.mel_fmin,
            self.hparams.mel_fmax,
        )
        y_hat_mel = y_hat_mel.to("cuda")

FemBoxbrawl · 2023-12-09T10:12:42Z

1Is there a way to fix training not being possible on win11 rtx 4050 laptop GPU? I have been trying to train locally for a week now. and it never worked. I get an NRvtc error, or this:

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/user/piper/src/python/piper_train/main.py", line 147, in
main()
File "/home/user/piper/src/python/piper_train/main.py", line 124, in main
trainer.fit(model)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
self._call_and_handle_interrupt(
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
results = self._run_stage()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
return self._run_train()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
self.fit_loop.run()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance
batch_output = self.batch_loop.run(kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance
outputs = self.optimizer_loop.run(optimizers, kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance
result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization
self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1550, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1705, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/adamw.py", line 100, in step
loss = closure()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure
closure_result = closure()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in call
self._result = self.closure(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 132, in closure
step_output = self._step_fn()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 407, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 358, in training_step
return self.model.training_step(*args, **kwargs)
File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 191, in training_step
return self.training_step_g(batch)
File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 214, in training_step_g
) = self.model_g(x, x_lengths, spec, spec_lengths, speaker_ids)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/piper/src/python/piper_train/vits/models.py", line 625, in forward
z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/piper/src/python/piper_train/vits/models.py", line 292, in forward
x = self.enc(x, x_mask, g=g)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/piper/src/python/piper_train/vits/modules.py", line 199, in forward
acts = fused_add_tanh_sigmoid_multiply(x_in, g_l, n_channels_tensor)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern "C" global
void fused_tanh_sigmoid_mul(float* tv_, float* tv__, float* aten_mul, float* aten_sigmoid, float* aten_tanh) {
{
if ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)<2661120ll ? 1 : 0) {
float tv___1 = ldg(tv + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) % 221760ll + 2ll * ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 221760ll) * 221760ll));
aten_tanh[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = tanhf(tv___1);
float tv__1 = _ldg(tv + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) % 221760ll + 2ll * ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 221760ll) * 221760ll));
aten_sigmoid[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = 1.f / (1.f + (expf(0.f - tv__1)));
aten_mul[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = (tanhf(tv___1)) * (1.f / (1.f + (expf(0.f - tv__1))));
}}
}

ei23fxg · 2023-12-13T18:49:28Z

@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia)
Did select the nvidia gpu?

FemBoxbrawl · 2023-12-13T19:48:38Z

@ei23fxg I do, but i don't know how to do that (select gpu)

FemBoxbrawl · 2023-12-14T08:28:48Z

@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia) Did select the nvidia gpu?

I got it to work finally after help from @Graylington the legend.

lgy250 · 2024-01-05T09:33:48Z

@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia) Did select the nvidia gpu?

I got it to work finally after help from @Graylington the legend.

could you please tell how to solve the problem.I have met the same issue with win11 rtx 4050 laptop GPU.
thanks!!!!!!!!!

FemBoxbrawl · 2024-01-05T09:49:11Z

this is from Graylington, I cannot remember how to do it exactly in detail, but I troubleshooted and it worked( I do not remember what I did exactly, but just follow this:

(Graylington original message)

"here how i make to work with RTX 4090 and wls2 i use win 10

install developer python

sudo apt-get install python3-dev

Then create a Python virtual environment and activated:

cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate
update pip and wheel setuptools

pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools
install pytorch this version pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

change in requirements.txt

cython>=0.29.0,<1
librosa>=0.9.2,<1
piper-phonemize~=1.1.0
numpy>=1.19.0
onnxruntime>=1.11.0
pytorch-lightning~=1.9.0
onnx
run

pip3 install -e .

build build_monotonic_align

chmod +x build_monotonic_align.sh
build_monotonic_align.sh
i hope this help

This works great on my 4090! Problem is, I can no longer run inference."

FemBoxbrawl · 2024-01-05T09:49:55Z

@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia) Did select the nvidia gpu?

I got it to work finally after help from @Graylington the legend.

could you please tell how to solve the problem.I have met the same issue with win11 rtx 4050 laptop GPU. thanks!!!!!!!!!

Changing the text requirements was probably the most crucial step I think, but I don't remember

lgy250 · 2024-01-08T02:23:04Z

@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia) Did select the nvidia gpu?

I got it to work finally after help from @Graylington the legend.

could you please tell how to solve the problem.I have met the same issue with win11 rtx 4050 laptop GPU. thanks!!!!!!!!!

Changing the text requirements was probably the most crucial step I think, but I don't remember

thanks，I have finished the issue!!!!!!!!!!!!
best wishes!!!!

zywek123 · 2024-01-19T17:47:54Z

How you did it?
After changing pytorch-lightning to 1.8.4 and above, i'm receiving this errors:
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/zywek/git/piper/src/python/piper_train/main.py", line 147, in
main()
File "/home/zywek/git/piper/src/python/piper_train/main.py", line 124, in main
trainer.fit(model)
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit
call._call_and_handle_interrupt(
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run
results = self._run_stage()
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage
self._run_train()
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1190, in _run_train
self._run_sanity_check()
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1262, in _run_sanity_check
val_loop.run()
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 152, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 137, in advance
output = self._evaluation_step(**kwargs)
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 234, in _evaluation_step
output = self.trainer._call_strategy_hook(hook_name, *kwargs.values())
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1480, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 390, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/home/zywek/git/piper/src/python/piper_train/vits/lightning.py", line 302, in validation_step
self.logger.experiment.add_audio(
TypeError: add_audio() missing 1 required positional argument: 'global_step'

eusthace811 · 2024-05-16T19:29:21Z

Hi thank you very much for your great work !

here how i make to work with RTX 4090 and wls2 i use win 10

install developer python

sudo apt-get install python3-dev

Then create a Python virtual environment and activated:
cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate
update pip and wheel setuptools
pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools
install pytorch this version pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

change in requirements.txt
cython>=0.29.0,<1
librosa>=0.9.2,<1
piper-phonemize~=1.1.0
numpy>=1.19.0
onnxruntime>=1.11.0
pytorch-lightning~=1.9.0
onnx
run

pip3 install -e .

build build_monotonic_align
chmod +x build_monotonic_align.sh
build_monotonic_align.sh
i hope this help

Thanks to you Guys, I think I may be almost there.

My new issue:

lightning_fabric/utilities/types.py", line 36, in
UntypedStorage: TypeAlias = torch.UntypedStorage
AttributeError: module 'torch' has no attribute 'UntypedStorage'

Any help?

[UPDATE]

The issue was related to the torch version I installed. I reinstalled everything and it looks like its working.

Thank you Guys!

nikywilliams · 2024-06-30T05:36:47Z

Thank you lpscr!
I've been trying to solve this dreaded "RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR" for 3 days. I was about to give up when I came across a comment on a YouTube video that there was a fix mentioned on the issues board.
For reference, my GPU is listed as: NVIDIA RTX 4000 Ada Generation Laptop GPU

tomuta · 2024-12-30T02:54:21Z

I ran into this with my RTX 4060 Ti as well (on Ubuntu 24.04). I struggled getting @lpscr's solution to work, my mistake was that I wasn't following them properly. I only updated the pytorch-lightning version in requirements.txt (as it is different). What I should have done was replace the entire file with the contents exactly as pasted. Finally I am no longer getting the "RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR" error anymore! Thank you so much @lpscr!

Nocturna22 mentioned this issue Jun 18, 2024

nvidia 4060TI issue #518

Open

This was referenced Nov 28, 2024

Piper Training Broken (Python environment in WSL and Docker) #606

Open

Windows support #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux) #295

No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux) #295

ei23fxg commented Dec 3, 2023

aaronnewsome commented Dec 4, 2023

lpscr commented Dec 4, 2023

aaronnewsome commented Dec 4, 2023

ei23fxg commented Dec 5, 2023 •

edited

Loading

aaronnewsome commented Dec 5, 2023

ei23fxg commented Dec 5, 2023

ei23fxg commented Dec 6, 2023 •

edited

Loading

aaronnewsome commented Dec 6, 2023

qt06 commented Dec 6, 2023

lpscr commented Dec 6, 2023

mitchelldehaven commented Dec 8, 2023 •

edited

Loading

FemBoxbrawl commented Dec 9, 2023 •

edited

Loading

ei23fxg commented Dec 13, 2023

FemBoxbrawl commented Dec 13, 2023 •

edited

Loading

FemBoxbrawl commented Dec 14, 2023

lgy250 commented Jan 5, 2024

FemBoxbrawl commented Jan 5, 2024

FemBoxbrawl commented Jan 5, 2024

lgy250 commented Jan 8, 2024

zywek123 commented Jan 19, 2024

eusthace811 commented May 16, 2024 •

edited

Loading

nikywilliams commented Jun 30, 2024

tomuta commented Dec 30, 2024

No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux) #295

No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux) #295

Comments

ei23fxg commented Dec 3, 2023

aaronnewsome commented Dec 4, 2023

lpscr commented Dec 4, 2023

aaronnewsome commented Dec 4, 2023

ei23fxg commented Dec 5, 2023 • edited Loading

aaronnewsome commented Dec 5, 2023

ei23fxg commented Dec 5, 2023

ei23fxg commented Dec 6, 2023 • edited Loading

aaronnewsome commented Dec 6, 2023

qt06 commented Dec 6, 2023

lpscr commented Dec 6, 2023

mitchelldehaven commented Dec 8, 2023 • edited Loading

FemBoxbrawl commented Dec 9, 2023 • edited Loading

ei23fxg commented Dec 13, 2023

FemBoxbrawl commented Dec 13, 2023 • edited Loading

FemBoxbrawl commented Dec 14, 2023

lgy250 commented Jan 5, 2024

FemBoxbrawl commented Jan 5, 2024

FemBoxbrawl commented Jan 5, 2024

lgy250 commented Jan 8, 2024

zywek123 commented Jan 19, 2024

eusthace811 commented May 16, 2024 • edited Loading

nikywilliams commented Jun 30, 2024

tomuta commented Dec 30, 2024

ei23fxg commented Dec 5, 2023 •

edited

Loading

ei23fxg commented Dec 6, 2023 •

edited

Loading

mitchelldehaven commented Dec 8, 2023 •

edited

Loading

FemBoxbrawl commented Dec 9, 2023 •

edited

Loading

FemBoxbrawl commented Dec 13, 2023 •

edited

Loading

eusthace811 commented May 16, 2024 •

edited

Loading