Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia 4060TI issue #518

Open
ConanWolf46 opened this issue Jun 5, 2024 · 4 comments
Open

nvidia 4060TI issue #518

ConanWolf46 opened this issue Jun 5, 2024 · 4 comments

Comments

@ConanWolf46
Copy link

python3 -m piper_train --dataset-dir ~/newvoice/Me/output --accelerator 'gpu' --devices 1 --batch-size 32 --validation-split 0.0 --num-test-examples 0 --max_epochs 10000 --resume_from_checkpoint ~/piper/epoch\=2164-step\=1355540.ckpt --checkpoint-epochs 1 --precision 32
DEBUG:piper_train:Namespace(dataset_dir='/home/NAME/newvoice/Me/output', checkpoint_epochs=1, quality='medium', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=10000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=50, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='/home/NAME/piper/epoch=2164-step=1355540.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=32, validation_split=0.0, num_test_examples=0, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234)
/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting `Trainer(resume_from_checkpoint=)` is deprecated in v1.5 and will be removed in v1.7. Please pass `Trainer.fit(ckpt_path=)` directly instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 1 epoch(s)
DEBUG:vits.dataset:Loading dataset: /home/NAME/newvoice/Me/output/dataset.jsonl
/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning: `trainer.resume_from_checkpoint` is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with `trainer.fit(ckpt_path=)` instead.
  ckpt_path = ckpt_path or self.resume_from_checkpoint
Restoring states from the checkpoint path at /home/NAME/piper/epoch=2164-step=1355540.ckpt
DEBUG:fsspec.local:open file: /home/NAME/piper/epoch=2164-step=1355540.ckpt
/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1659: UserWarning: Be aware that when using `ckpt_path`, callbacks used to create the checkpoint need to be provided during `Trainer` instantiation. Please add the following callbacks: ["ModelCheckpoint{'monitor': None, 'mode': 'min', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None}"].
  rank_zero_warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /home/NAME/newvoice/Me/output/lightning_logs/version_4/hparams.yaml
Restored all states from the checkpoint file at /home/NAME/piper/epoch=2164-step=1355540.ckpt
/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:153: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
  rank_zero_warn(
/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:236: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (2) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/NAME/piper/src/python/piper_train/__main__.py", line 147, in <module>
    main()
  File "/home/NAME/piper/src/python/piper_train/__main__.py", line 124, in main
    trainer.fit(model)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
    results = self._run_stage()
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
    return self._run_train()
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
    self.fit_loop.run()
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1550, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1705, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 140, in wrapper
    out = func(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/adamw.py", line 120, in step
    loss = closure()
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure
    closure_result = closure()
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 132, in closure
    step_output = self._step_fn()
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 407, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 358, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/home/NAME/piper/src/python/piper_train/vits/lightning.py", line 191, in training_step
    return self.training_step_g(batch)
  File "/home/NAME/piper/src/python/piper_train/vits/lightning.py", line 230, in training_step_g
    y_hat_mel = mel_spectrogram_torch(
  File "/home/NAME/piper/src/python/piper_train/vits/mel_processing.py", line 120, in mel_spectrogram_torch
    torch.stft(
  File "/home/NAME/piper/src/python/.venv/lib/python3.10/site-packages/torch/functional.py", line 632, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

im trying to train with a 4060Ti but i am having issues, any suggestions?

@ConanWolf46
Copy link
Author

still the same issues, i have tried installing a later version of really everything trying to match everything up, the GPU driver and matching the CUDA version with the DRIVER then the cuDDN or whatever its called to match with the CUDA as well then installed the proper pytorch to match that , which still failed with the same error, im tired of this....

@Nocturna22
Copy link

I have the same error:

RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

@Nocturna22
Copy link

I'm on Debian.


(.venv) (base) humanity@0-Milky-Way:~/piper/src/python$ python3 -m piper_train --dataset-dir ~/piper/my-training --accelerator 'gpu' --devices 1 --batch-size 32 --validation-split 0.0 --num-test-examples 0 --max_epochs 6000 --resume_from_checkpoint ~/piper/epoch\=2665-step\=1182078.ckpt --checkpoint-epochs 1 --precision 32 --quality high
DEBUG:piper_train:Namespace(dataset_dir='/home/humanity/piper/my-training', checkpoint_epochs=1, quality='high', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=6000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=50, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='/home/humanity/piper/epoch=2665-step=1182078.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=32, validation_split=0.0, num_test_examples=0, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234)
/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting `Trainer(resume_from_checkpoint=)` is deprecated in v1.5 and will be removed in v1.7. Please pass `Trainer.fit(ckpt_path=)` directly instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 1 epoch(s)
DEBUG:vits.dataset:Loading dataset: /home/humanity/piper/my-training/dataset.jsonl
/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning: `trainer.resume_from_checkpoint` is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with `trainer.fit(ckpt_path=)` instead.
  ckpt_path = ckpt_path or self.resume_from_checkpoint
Missing logger folder: /home/humanity/piper/my-training/lightning_logs
Restoring states from the checkpoint path at /home/humanity/piper/epoch=2665-step=1182078.ckpt
DEBUG:fsspec.local:open file: /home/humanity/piper/epoch=2665-step=1182078.ckpt
/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:345: UserWarning: The dirpath has changed from '/ssd/piper/out-train/lightning_logs/version_1/checkpoints' to '/home/humanity/piper/my-training/lightning_logs/version_0/checkpoints', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded.
  warnings.warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /home/humanity/piper/my-training/lightning_logs/version_0/hparams.yaml
Restored all states from the checkpoint file at /home/humanity/piper/epoch=2665-step=1182078.ckpt
/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:153: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
  rank_zero_warn(
/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:236: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 24 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (33) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Traceback (most recent call last):
  File "/home/humanity/pinokio/bin/miniconda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/humanity/pinokio/bin/miniconda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/humanity/piper/src/python/piper_train/__main__.py", line 147, in <module>
    main()
  File "/home/humanity/piper/src/python/piper_train/__main__.py", line 124, in main
    trainer.fit(model)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
    results = self._run_stage()
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
    return self._run_train()
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
    self.fit_loop.run()
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1550, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1705, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 140, in wrapper
    out = func(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/adamw.py", line 120, in step
    loss = closure()
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure
    closure_result = closure()
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 132, in closure
    step_output = self._step_fn()
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 407, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 358, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/home/humanity/piper/src/python/piper_train/vits/lightning.py", line 191, in training_step
    return self.training_step_g(batch)
  File "/home/humanity/piper/src/python/piper_train/vits/lightning.py", line 230, in training_step_g
    y_hat_mel = mel_spectrogram_torch(
  File "/home/humanity/piper/src/python/piper_train/vits/mel_processing.py", line 120, in mel_spectrogram_torch
    torch.stft(
  File "/home/humanity/piper/src/python/.venv/lib/python3.10/site-packages/torch/functional.py", line 632, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

@Nocturna22
Copy link

still the same issues, i have tried installing a later version of really everything trying to match everything up, the GPU driver and matching the CUDA version with the DRIVER then the cuDDN or whatever its called to match with the CUDA as well then installed the proper pytorch to match that , which still failed with the same error, im tired of this....

Hey, i found someone with the solution:

#295 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants