WARNING:vits.dataset:Skipped X utterance(s) #663

coffeecodeconverter · 2024-11-30T13:00:45Z

if you get a warning when loading the dataset reporting "Skipped X Utterance(s)", like these:
(see last line of each snippet)

DEBUG:piper_train:Checkpoints will be saved every 5 epoch(s)
DEBUG:piper_train:0 Checkpoints will be saved
DEBUG:vits.dataset:Loading dataset: /content/drive/MyDrive/colab/piper/Jarvis/dataset.jsonl
WARNING:vits.dataset:Skipped 5 utterance(s)

DEBUG:piper_train:Checkpoints will be saved every 100 epoch(s)
DEBUG:piper_train:0 Checkpoints will be saved
DEBUG:vits.dataset:Loading dataset: /testing/piper-training/dataset.jsonl
WARNING:vits.dataset:Skipped 31 utterance(s)

you have a formatting problem with your dataset.

NOTE:
if ALL utterances in your dataset are skipped, you get this error:

Trainer.fit stopped: No training batches.

(because nothing loaded correctly for it to train on)

You might find only some of your utterances are skipped, in which case, the training carries on.
However, you might miss the "Skipped X Utterance(s)" warning, and later wonder why your resulting voice model is poor quality.

now, this might not fix everyone's issue, but it did for me....

My problem was due to the the size (length) of each transcription/wav line being too long.
if i viewed my metadata.csv in notepad,
my transcriptions were wrapping round and taking up 4-5 lines - but were still technically a single huge line for each utterance.
i chopped my wavs and dataset into smaller pieces,
breaking them up by natural pauses or single sentences instead, whatever was shorter.
(beforehand, i was keeping them between 10-15 seconds in length, but it was a fast speaker, so even 10 seconds contained a lot of words and i think that was the issue)

I re-transcribed everything,
and made sure that i didn't have any one line that was long enough to wrap around
(im not 100% sure this is an exact pre-requisite of the data, but it fixed my issue at least)

deleted the previous cache, lightning-logs, config.json and dataset.jsonl (if applicable)
as this may contain data from when your dataset was skipping utterances, and they'll be garbage.
start fresh

re-ran the piper_train.preprocess

python3 -m piper_train.preprocess --language en --input-dir ~/dataprep --output-dir ~/train-me --dataset-format ljspeech --single-speaker --sample-rate 220502 --max-workers 1

then re-ran the training:

python3 -m piper_train --dataset-dir ~/train-me --accelerator 'gpu' --gpus 1 --batch-size 32 --validation-split 0.0 --num-test-examples 0 --max_epochs 700 --resume_from_checkpoint "~/checkpoint-files/cori-med-640.ckpt" --checkpoint-epochs 100 --precision 32 --max-phoneme-ids 400 --quality medium

and FINALLY it no longer skips any utterances

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 100 epoch(s)
DEBUG:vits.dataset:Loading dataset: /testing/piper-training/dataset.jsonl
/PiperTTS/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1906: LightningDeprecationWarning: `trainer.resume_from_checkpoint` is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with `trainer.fit(ckpt_path=)` instead.
  rank_zero_deprecation(

The text was updated successfully, but these errors were encountered:

coffeecodeconverter closed this as completed Nov 30, 2024

This was referenced Dec 1, 2024

preprocess error #246

Open

Windows support #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARNING:vits.dataset:Skipped X utterance(s) #663

WARNING:vits.dataset:Skipped X utterance(s) #663

coffeecodeconverter commented Nov 30, 2024 •

edited

Loading

WARNING:vits.dataset:Skipped X utterance(s) #663

WARNING:vits.dataset:Skipped X utterance(s) #663

Comments

coffeecodeconverter commented Nov 30, 2024 • edited Loading

coffeecodeconverter commented Nov 30, 2024 •

edited

Loading