You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if you get a warning when loading the dataset reporting "Skipped X Utterance(s)", like these:
(see last line of each snippet)
DEBUG:piper_train:Checkpoints will be saved every 5 epoch(s)
DEBUG:piper_train:0 Checkpoints will be saved
DEBUG:vits.dataset:Loading dataset: /content/drive/MyDrive/colab/piper/Jarvis/dataset.jsonl
WARNING:vits.dataset:Skipped 5 utterance(s)
DEBUG:piper_train:Checkpoints will be saved every 100 epoch(s)
DEBUG:piper_train:0 Checkpoints will be saved
DEBUG:vits.dataset:Loading dataset: /testing/piper-training/dataset.jsonl
WARNING:vits.dataset:Skipped 31 utterance(s)
you have a formatting problem with your dataset.
NOTE:
if ALL utterances in your dataset are skipped, you get this error:
Trainer.fit stopped: No training batches.
(because nothing loaded correctly for it to train on)
You might find only some of your utterances are skipped, in which case, the training carries on.
However, you might miss the "Skipped X Utterance(s)" warning, and later wonder why your resulting voice model is poor quality.
now, this might not fix everyone's issue, but it did for me....
My problem was due to the the size (length) of each transcription/wav line being too long.
if i viewed my metadata.csv in notepad,
my transcriptions were wrapping round and taking up 4-5 lines - but were still technically a single huge line for each utterance.
i chopped my wavs and dataset into smaller pieces,
breaking them up by natural pauses or single sentences instead, whatever was shorter.
(beforehand, i was keeping them between 10-15 seconds in length, but it was a fast speaker, so even 10 seconds contained a lot of words and i think that was the issue)
I re-transcribed everything,
and made sure that i didn't have any one line that was long enough to wrap around
(im not 100% sure this is an exact pre-requisite of the data, but it fixed my issue at least)
deleted the previous cache, lightning-logs, config.json and dataset.jsonl (if applicable)
as this may contain data from when your dataset was skipping utterances, and they'll be garbage.
start fresh
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 100 epoch(s)
DEBUG:vits.dataset:Loading dataset: /testing/piper-training/dataset.jsonl
/PiperTTS/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1906: LightningDeprecationWarning: `trainer.resume_from_checkpoint` is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with `trainer.fit(ckpt_path=)` instead.
rank_zero_deprecation(
The text was updated successfully, but these errors were encountered:
if you get a warning when loading the dataset reporting "Skipped X Utterance(s)", like these:
(see last line of each snippet)
you have a formatting problem with your dataset.
NOTE:
if ALL utterances in your dataset are skipped, you get this error:
(because nothing loaded correctly for it to train on)
You might find only some of your utterances are skipped, in which case, the training carries on.
However, you might miss the "Skipped X Utterance(s)" warning, and later wonder why your resulting voice model is poor quality.
now, this might not fix everyone's issue, but it did for me....
My problem was due to the the size (length) of each transcription/wav line being too long.
if i viewed my metadata.csv in notepad,
my transcriptions were wrapping round and taking up 4-5 lines - but were still technically a single huge line for each utterance.
i chopped my wavs and dataset into smaller pieces,
breaking them up by natural pauses or single sentences instead, whatever was shorter.
(beforehand, i was keeping them between 10-15 seconds in length, but it was a fast speaker, so even 10 seconds contained a lot of words and i think that was the issue)
I re-transcribed everything,
and made sure that i didn't have any one line that was long enough to wrap around
(im not 100% sure this is an exact pre-requisite of the data, but it fixed my issue at least)
deleted the previous cache, lightning-logs, config.json and dataset.jsonl (if applicable)
as this may contain data from when your dataset was skipping utterances, and they'll be garbage.
start fresh
re-ran the piper_train.preprocess
then re-ran the training:
and FINALLY it no longer skips any utterances
The text was updated successfully, but these errors were encountered: