Skip to content

Commit

Permalink
[TTS] porting VITS implementation (NVIDIA#5600)
Browse files Browse the repository at this point in the history
* Disable loss typecheck

* Fix spectrogram lengths

* Remove Precision 16 requirement

* Address lgtm alerts

* clean up unused code

* Address lgtm alerts

* Refactor audio_to_mel_torch method

* Use NeMo FilterBank to get melspec

Todo: set self.fb

* Fix filterbank max frequency to match with original VITS

* Fix filterbank features correct length

* Address lgtm issues

* Remove print statements

* Remove stft_pad_amount

* new structure for tts datasets in script folder

Signed-off-by: Oktai Tatanov <[email protected]>

* remove cmudict downloading

Signed-off-by: Oktai Tatanov <[email protected]>

* rename mixertts dataset, add vocoder dataset

Signed-off-by: Oktai Tatanov <[email protected]>

* add libritts processing

Signed-off-by: Oktai Tatanov <[email protected]>

* update tts dataset and libritts get data

Signed-off-by: Oktai Tatanov <[email protected]>

* fix bugs in vocoder ds

Signed-off-by: Oktai Tatanov <[email protected]>

* add ds

* changed vits yaml

* rm yaml

* fix yaml and model

* Added scaler

* refactored yaml

* managed to run in fp16

* refactoring

Signed-off-by: Oktai Tatanov <[email protected]>

* fix small bugs and add new todos

Signed-off-by: Oktai Tatanov <[email protected]>

* fix optimizers

Signed-off-by: Oktai Tatanov <[email protected]>

* Port Variational Inference with Adversarial Learning (VITS) to NeMo TTS (NVIDIA#6)

* Add vits files

Add vits_losses.py, vits_modules.py and vits.py.

* Move non-vits models to modules

* Add vits.yaml

* Add _loader to vits.py

* Add basic template for vits

* Update vits.yaml with vits parameters

* Remove extra space

* Add top level training script

* Add some variables to vits yaml

* Add forward and training methods

* Fix imports

* Added validation step

* Log training losses

* Update loss calls to use class attributes

* Add VITS to models list

* Fix all imports

* Remove old module calls

* Fix typo in monotonic align import

* Modified validation step

1. reverted to tensorboard
2. validation_step logs audio, mel-spec for batch 0
3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel

* Fix imports for VITS

* Remove old module calls

* Fix typo in monotonic align import

* Modified validation step

1. reverted to tensorboard
2. validation_step logs audio, mel-spec for batch 0
3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel

* Add parameters from original VITS config

* Fix config file

* Fix imports and generate spec from audio

* Fix incorrect dimensions

* Progress update

* Fix loss

* Fix cuda thing

* Fix monotonic align import

* Fix typos in vits.py

* Disable loss typecheck

* Fix spectrogram lengths

* Remove Precision 16 requirement

* Address lgtm alerts

* clean up unused code

* Address lgtm alerts

* Refactor audio_to_mel_torch method

* Use NeMo FilterBank to get melspec

Todo: set self.fb

* Fix filterbank max frequency to match with original VITS

* Fix filterbank features correct length

* Address lgtm issues

* Remove print statements

* Remove stft_pad_amount

Co-authored-by: martynwei <[email protected]>
Co-authored-by: Ryan Hong <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Jason <[email protected]>

* make new commit

Signed-off-by: Jason <[email protected]>

* add copyright headers

Signed-off-by: Jason <[email protected]>

* style

Signed-off-by: Jason <[email protected]>

* rename README

Signed-off-by: Oktai Tatanov <[email protected]>

* fix style without vits_modules

Signed-off-by: Oktai Tatanov <[email protected]>

* add numba code, fix style and add todos

Signed-off-by: Oktai Tatanov <[email protected]>

* small fix

* fix some todos

* added numba mas

* added DDP sampler

* specified versions

* fixed for new librosa version

* added feature loss

* added IPA phonemizer

* refactored IPA g2p

* added vits losses

* some ref

* fix

* added checkpointing

* cp

* cfg

* merged some 1.8.0 fixes

* plt fix

* fix logging

* fix checkpoint loading

* refactored inference

* fp32 run

* update branch

Signed-off-by: ericharper <[email protected]>

* update package info

Signed-off-by: ericharper <[email protected]>

* new exp

* update branch

Signed-off-by: ericharper <[email protected]>

* Restored tests previously disabled for 22.03 base (NVIDIA#4109)

Signed-off-by: Boris Fomitchev <[email protected]>

* add augmentation to label models (NVIDIA#4113)

* add augmentation to label models

Signed-off-by: nithinraok <[email protected]>

* duration fix

Signed-off-by: nithinraok <[email protected]>

* Call register_bert_model after assigning self.bert_model variable (NVIDIA#4116)

Signed-off-by: Ramanathan Arunachalam <[email protected]>

Co-authored-by: Ramanathan Arunachalam <[email protected]>

* Tutorial on ITN with Thutmose tagger and small fixes (NVIDIA#4117)

* 1. Add tutorial. 2. Move a function to fix import in tutorial. 3. Merge multiple spaces into one space in the final output

Signed-off-by: Alexandra Antonova <[email protected]>

* fixes for code review

Signed-off-by: Alexandra Antonova <[email protected]>

* Add tutorial to tutorials.rst

Signed-off-by: Alexandra Antonova <[email protected]>

Co-authored-by: Alexandra Antonova <[email protected]>

* cleaned up TN/ ITN doc (NVIDIA#4119)

* cleaned up TN/ ITN doc

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* Check implicit grad acc in GLUE dataset building (NVIDIA#4123)

* Check implicit grad acc in GLUE dataset building

Signed-off-by: MaximumEntropy <[email protected]>

* Fix jenkins test for GLUE/XNLI

Signed-off-by: MaximumEntropy <[email protected]>

* update the default (NVIDIA#4135)

Signed-off-by: ekmb <[email protected]>

* Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (NVIDIA#4136)

* Fix restoring from checkpoint with label vocab dir

Signed-off-by: PeganovAnton <[email protected]>

* Add tests for various ways to pass label ids to model

Signed-off-by: PeganovAnton <[email protected]>

* Fix typo

Signed-off-by: PeganovAnton <[email protected]>

* Fix typo

Signed-off-by: PeganovAnton <[email protected]>

* Do not create tmp directory

Signed-off-by: PeganovAnton <[email protected]>

* Fix parameter name

Signed-off-by: PeganovAnton <[email protected]>

* finish cherry-pick op

Signed-off-by: PeganovAnton <[email protected]>

* Fix labels errors

Signed-off-by: PeganovAnton <[email protected]>

* Remove duplicate stage

Signed-off-by: PeganovAnton <[email protected]>

* Change target branch

Signed-off-by: PeganovAnton <[email protected]>

* fix typo (NVIDIA#4140)

Signed-off-by: Yang Zhang <[email protected]>

* Fix/punctuation avoid overwritting tmp files (NVIDIA#4144)

* Add draft of fixing tmp files overwritting

Signed-off-by: PeganovAnton <[email protected]>

* Remove accidental changes

Signed-off-by: PeganovAnton <[email protected]>

* Remove accidental changes

Signed-off-by: PeganovAnton <[email protected]>

* Use built-in tempfile library

Signed-off-by: PeganovAnton <[email protected]>

* Fix code style

Signed-off-by: PeganovAnton <[email protected]>

* bug_fix_diarization_manifest_creation (NVIDIA#4125)

Signed-off-by: Yang Zhang <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>

* fix doc (NVIDIA#4146)

Signed-off-by: Yang Zhang <[email protected]>

* Tacotron2 retrain (NVIDIA#4103)

* fix yaml

Signed-off-by: treacker <[email protected]>

* Fix for new TTSDataset class

Signed-off-by: treacker <[email protected]>

* added wandb logging

Signed-off-by: treacker <[email protected]>

* added wandb logging

Signed-off-by: treacker <[email protected]>

* fix numpy version

Signed-off-by: treacker <[email protected]>

* fix numpy version

Signed-off-by: treacker <[email protected]>

* inference fix

Signed-off-by: treacker <[email protected]>

* removed old code

Signed-off-by: treacker <[email protected]>

* updated parser logic

Signed-off-by: treacker <[email protected]>

* reverted version update

Signed-off-by: treacker <[email protected]>

* refactored parser logic

Signed-off-by: treacker <[email protected]>

* Updated Jenkinsfile

Signed-off-by: treacker <[email protected]>

* Refactored tutorial for Tacotron2

Signed-off-by: treacker <[email protected]>

* Made backward compatibility

Signed-off-by: treacker <[email protected]>

* Made backward compatibility

Signed-off-by: treacker <[email protected]>

* Update Jenkinsfile

Signed-off-by: treacker <[email protected]>

* Update tacotron.yaml

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* cleaned up TN/ ITN doc (NVIDIA#4119)

* cleaned up TN/ ITN doc

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: treacker <[email protected]>

* Check implicit grad acc in GLUE dataset building (NVIDIA#4123)

* Check implicit grad acc in GLUE dataset building

Signed-off-by: MaximumEntropy <[email protected]>

* Fix jenkins test for GLUE/XNLI

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Fixed jenkins

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Multiprocess improvements (NVIDIA#4127)

* initial commit

Signed-off-by: nithinraok <[email protected]>

* start fix

Signed-off-by: nithinraok <[email protected]>

* improve multiprocessing speed while creating speaker dataset

Signed-off-by: nithinraok <[email protected]>

* updated scp to filelist

Signed-off-by: nithinraok <[email protected]>

* WaveGlow input type fixes (NVIDIA#4151)

Signed-off-by: Jocelyn Huang <[email protected]>

* notebooks' link, typo and import  fix  (NVIDIA#4158)

* redo missing pr 4007

Signed-off-by: fayejf <[email protected]>

* remove extremely unreliable links

Signed-off-by: fayejf <[email protected]>

* Thutmose tagger bug fixes (NVIDIA#4162)

* add pretrained ngc model, small fixes

Signed-off-by: Alexandra Antonova <[email protected]>

* fix model location

Signed-off-by: Alexandra Antonova <[email protected]>

* fix model location

Signed-off-by: Alexandra Antonova <[email protected]>

* 1. fix typos. 2. write magic functions without space

Signed-off-by: Alexandra Antonova <[email protected]>

* add example of inference with pretrained model

Signed-off-by: Alexandra Antonova <[email protected]>

* changed model location to nemo

Signed-off-by: Alexandra Antonova <[email protected]>

* style fix

Signed-off-by: Alexandra Antonova <[email protected]>

* fix space

Signed-off-by: Alexandra Antonova <[email protected]>

Co-authored-by: Alexandra Antonova <[email protected]>

* update speaker docs (NVIDIA#4164)

* update speaker docs

Signed-off-by: nithinraok <[email protected]>

* chunks -> segments

Signed-off-by: nithinraok <[email protected]>

* Khz -> kHz

Signed-off-by: nithinraok <[email protected]>

* changed to vits g2p

* refactoring

* added cosineLR

* Updated whitelist path

* added vanilla torch grad scaler

* Fixed lightning version

* added warmup and wd

* switched to cosineLR

* refactored data classes for vits

* some fixes

* fixed import

* changeg train loop

* fixed scheduler bug

* refactoring for exps

* Refactored loss logic

* Ref for exps

* added coqui stuff

* exps

* bugfix

* added side file

* bugfix

* reverted

* fixed sampler behaviour

* updated for ptl 1.7.2

* refactored dataloader func

* some cleaning

* reverted to vanilla loss

* modified for pickling

* added dataset class

* fixed torch version

* added autocast for fp training

* removed coqui files

* 'Fixed tokenizer'

* Fix tokenizer

* update branch

Signed-off-by: ericharper <[email protected]>

* Fix link to inference notebook (NVIDIA#5247)

Signed-off-by: Jocelyn Huang <[email protected]>

Signed-off-by: Jocelyn Huang <[email protected]>

* Update ASR scores table (NVIDIA#5254)

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>

* Fix links to speaker identification notebook (NVIDIA#5260)

Signed-off-by: SeanNaren <[email protected]>

Signed-off-by: SeanNaren <[email protected]>

* Minor typo fixes in TTS tutorial (NVIDIA#5266)

Signed-off-by: Jocelyn Huang <[email protected]>

Signed-off-by: Jocelyn Huang <[email protected]>

* Pcla tutorial fixes (NVIDIA#5271)

* Fixed typos

Signed-off-by: Matvei Novikov <[email protected]>

* Fixed cell type and tatoeba reference

Signed-off-by: Matvei Novikov <[email protected]>

* Fixed typo

Signed-off-by: Matvei Novikov <[email protected]>

* Fixed branch variable

Signed-off-by: Matvei Novikov <[email protected]>

Signed-off-by: Matvei Novikov <[email protected]>

* Fix bug into Dialogue tutorial (NVIDIA#5277)

* Typo fix (NVIDIA#5288)

Signed-off-by: Matvei Novikov <[email protected]>

Signed-off-by: Matvei Novikov <[email protected]>

* Fix dialogue tutorial bug (NVIDIA#5297)

* set add_pooling_layer=False for huggingface bert model

* remove add_pooling_layer=False and set find_unused_parameters=True

* set num_prompt_tokens to 0 for huggingface

* small bugfix for r1.13.0 (NVIDIA#5310)

* typo fix

Signed-off-by: fayejf <[email protected]>

* udpate transcribe

Signed-off-by: fayejf <[email protected]>

Signed-off-by: fayejf <[email protected]>

* Add italian model checkpoints (NVIDIA#5316)

Signed-off-by: Igor Gitman <[email protected]>

Signed-off-by: Igor Gitman <[email protected]>

* [STT] Add Ru ASR Conformer-CTC and Conformer-Transducer (NVIDIA#5340)

* [STT] Add stt_ru_conformer_ctc_large

Signed-off-by: Sasha Meister <[email protected]>

* [STT] Add stt_ru_conformer_transducer_large

Add stt_ru_conformer_transducer_large

Signed-off-by: Sasha Meister <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Sasha Meister <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pcla tutorial fixes (NVIDIA#5313)

* fixes

Signed-off-by: Matvei Novikov <[email protected]>

* fixes

Signed-off-by: Matvei Novikov <[email protected]>

* moved `create_text_and_labels` to token_classification_utils.py

Signed-off-by: Matvei Novikov <[email protected]>

Signed-off-by: Matvei Novikov <[email protected]>

* a lot of refactoring

* strict ptl version

* strict ptl version

* reverted plt version

* Added base text2audio class

* Fix issue with HF Model upload tutorial (NVIDIA#5359)

* Add Gradio App to ASR Docs (NVIDIA#5270)

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>
(cherry picked from commit e4b6a38)

* Fix issue with normalized config for dataset name

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>

* tutorial fixes (NVIDIA#5354)

Signed-off-by: Matvei Novikov <[email protected]>

Signed-off-by: Matvei Novikov <[email protected]>

* Add SDP documentation (NVIDIA#5274)

* Add details to SDP README.md

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add docstring to WriteManifest processor

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add docstring to CreateInitialManifestMLS

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add ModifyManifestTextProcessor docstring

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add ASRInference docstring

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add base_processor docstrings

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add minimal SDP docs page

Signed-off-by: Elena Rastorgueva <[email protected]>

* Update tools/speech_dataset_processor/README.md

Co-authored-by: Igor Gitman <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Write simple README for SDP and move complex explanations to docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove incorrect type hints

Signed-off-by: Elena Rastorgueva <[email protected]>

* Make config example less confusing

Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix typo

Signed-off-by: Elena Rastorgueva <[email protected]>

* Clarify that YAML file is config file in README

Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove unused imports

Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove SDP docs for now

Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove links to docs in SDP README

Signed-off-by: Elena Rastorgueva <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Igor Gitman <[email protected]>

* [Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 (NVIDIA#5375)

* Fix minor error in notebook

Signed-off-by: Taejin Park <[email protected]>

* changed branch name in tutorial notebook

Signed-off-by: Taejin Park <[email protected]>

Signed-off-by: Taejin Park <[email protected]>

* Rename Speech Dataset Processor to Speech Data Processor (NVIDIA#5378)

Signed-off-by: Elena Rastorgueva <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix for num worker 0 causing issues in losses after 1 epoch (NVIDIA#5379)

* Fixed bug in notebook (NVIDIA#5382)

Signed-off-by: Virginia Adams <[email protected]>

Signed-off-by: Virginia Adams <[email protected]>

* Force MHA QKV onto fp32 (NVIDIA#5391)

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>

* Added scheduling variety

* ref

* Fix for prompt table restore error (NVIDIA#5393)

* Fix for prompt table restore error

Signed-off-by: Virginia Adams <[email protected]>

* Added more saftey checks

Signed-off-by: Virginia Adams <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added more condition checks

Signed-off-by: Virginia Adams <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Virginia Adams <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (NVIDIA#5410)

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* bugfix

* import tests

* Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421)

Signed-off-by: Yu Yao <[email protected]>

Signed-off-by: Yu Yao <[email protected]>

* Megatron Export Update (NVIDIA#5343)

* export update for Megatron + change ORT optimization

Signed-off-by: David Mosallanezhad <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated export_utils to use autocast instead of manually casting >:/

Signed-off-by: David Mosallanezhad <[email protected]>

* removed dtype from LayerNorm

Signed-off-by: David Mosallanezhad <[email protected]>

* added comment

Signed-off-by: David Mosallanezhad <[email protected]>

* reverting changes on FloatCast

Signed-off-by: David Mosallanezhad <[email protected]>

* Cherry-picked changes from megatron-norm

Signed-off-by: Boris Fomitchev <[email protected]>

* updated asr_model import to cast_utils

Signed-off-by: David Mosallanezhad <[email protected]>

* updated del onnx_model place

Signed-off-by: David Mosallanezhad <[email protected]>

* changed ort optimization to basic -> temp fix

Signed-off-by: David Mosallanezhad <[email protected]>

Signed-off-by: David Mosallanezhad <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <[email protected]>

* disable pc test (NVIDIA#5426)

Signed-off-by: ekmb <[email protected]>

Signed-off-by: ekmb <[email protected]>

* Fix GPT generation when using sentencepiece tokenizer (NVIDIA#5413)

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Disable sync_batch_comm in validation_step for GPT (NVIDIA#5397)

* disable sync_batch_comm in validation_step

Signed-off-by: ericharper <[email protected]>

* Read sync_batch_comm from config or default to False

Signed-off-by: Markel Sanz Ausin <[email protected]>

* Update megatron_gpt_config to default sync_batch_comm to False to avoid CUDA error

Signed-off-by: Markel Sanz Ausin <[email protected]>

* Empty

Signed-off-by: MaximumEntropy <[email protected]>

* Comment out test

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Markel Sanz Ausin <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Markel Sanz Ausin <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421)" (NVIDIA#5431)

This reverts commit 0718b17.

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA#5420)

* Revert workers workaround

Signed-off-by: MaximumEntropy <[email protected]>

* Fix in config

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Fixed discrepancies

* updated Jenkisfile

* updated Jenkisfile

* Cleaning

* fixed the onnx bug in conformer for non-streaming models. (NVIDIA#5242) (NVIDIA#5446)

Signed-off-by: Vahid <[email protected]>

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>

* Set sync_batch_comm in other places (NVIDIA#5448)

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* Radtts 1.13 (NVIDIA#5451)

* [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (NVIDIA#5358)
* [TTS] add CI test for RADTTS training recipe.

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Radtts 1.13 plus (NVIDIA#5457)

* [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (NVIDIA#5358)
* Fixing RADTTS training - removing view buffer and fixing accuracy issue
* Fixes for Torchscript/Triton
* Added autocast to radtts UT
* using cuda() for training example

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Add num layers check (NVIDIA#5470)

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* Change to kwargs (NVIDIA#5475)

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA#5339) (NVIDIA#5478)

* Initial refactor

Signed-off-by: MaximumEntropy <[email protected]>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes for eval

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <[email protected]>

* Refactor

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <[email protected]>

* Remove comments

Signed-off-by: MaximumEntropy <[email protected]>

* Minor

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <[email protected]>

* Remove old comment

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* export_utils bugfix (NVIDIA#5480)

* updated export_utils

Signed-off-by: David Mosallanezhad <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Mosallanezhad <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Export fixes for Riva (NVIDIA#5496)

* Export fixes for Riva

Signed-off-by: Boris Fomitchev <[email protected]>

* Cleaning up training_utils

Signed-off-by: Boris Fomitchev <[email protected]>

Signed-off-by: Boris Fomitchev <[email protected]>

* minor bug fix (NVIDIA#5521)

Signed-off-by: David Mosallanezhad <[email protected]>

Signed-off-by: David Mosallanezhad <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>

* added set_start_method + function param bugfix (NVIDIA#5539)

* added set_start_method + function param bugfix

Signed-off-by: David Mosallanezhad <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* upper bound torchmetrics

Signed-off-by: ericharper <[email protected]>

Signed-off-by: David Mosallanezhad <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ericharper <[email protected]>

* remove notebook (NVIDIA#5548)

Signed-off-by: ericharper <[email protected]>

Signed-off-by: ericharper <[email protected]>

* Remove broadcast (NVIDIA#5558)

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* cleaning

* Fix all gather while writing to a file during T5 finetuning (NVIDIA#5561)

* Gather from data parallel only instead of all ranks

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* update readme

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added copyright

* fixed imports

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed filesize check

* last cleaning

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated cmudict path

* fixed merge bug

Signed-off-by: Evgeniy Shabalin <[email protected]>

* warnings fix

* fix warnings

Signed-off-by: Evgeniy Shabalin <[email protected]>

* storing

* updated version

Signed-off-by: Evgeniy Shabalin <[email protected]>

* update Jenkinsfile versions

Signed-off-by: Evgeniy Shabalin <[email protected]>

* fixed issues

Signed-off-by: Evgeniy Shabalin <[email protected]>

* fixed more issues

* more fixes

Signed-off-by: Evgeniy Shabalin <[email protected]>

* added experimental tag

* Clarification updates

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Evgeniy Shabalin <[email protected]>

* remove old cython code

Signed-off-by: Evgeniy Shabalin <[email protected]>

* remove old cython code

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring fix

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Enhancements

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Enhancements

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* imports fix

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Evgeniy Shabalin <[email protected]>

* excessive comtutations fix

Signed-off-by: Evgeniy Shabalin <[email protected]>

* typecheck fix

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Small refactoring

* Small refactoring

Signed-off-by: Evgeniy Shabalin <[email protected]>

* reversed exp_manager params

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Fixed call for new function signature

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Jason <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: ekmb <[email protected]>
Signed-off-by: PeganovAnton <[email protected]>
Signed-off-by: Jocelyn Huang <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: SeanNaren <[email protected]>
Signed-off-by: Matvei Novikov <[email protected]>
Signed-off-by: Igor Gitman <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Virginia Adams <[email protected]>
Signed-off-by: Yu Yao <[email protected]>
Signed-off-by: David Mosallanezhad <[email protected]>
Signed-off-by: Markel Sanz Ausin <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Vahid <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Evgeniy Shabalin <[email protected]>
Co-authored-by: jasonjjl1999 <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: jasonjjl1999 <[email protected]>
Co-authored-by: martynwei <[email protected]>
Co-authored-by: Ryan Hong <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: ericharper <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Ramanathan Arunachalam <[email protected]>
Co-authored-by: Ramanathan Arunachalam <[email protected]>
Co-authored-by: bene-ges <[email protected]>
Co-authored-by: Alexandra Antonova <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: PeganovAnton <[email protected]>
Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Sean Naren <[email protected]>
Co-authored-by: Matvei Novikov <[email protected]>
Co-authored-by: Zhilin Wang <[email protected]>
Co-authored-by: Igor Gitman <[email protected]>
Co-authored-by: Sasha Meister <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Virginia Adams <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Markel Sanz Ausin <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
  • Loading branch information
Show file tree
Hide file tree
Showing 19 changed files with 2,794 additions and 9 deletions.
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -4509,4 +4509,4 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
cleanWs()
}
}
}
}
215 changes: 215 additions & 0 deletions examples/tts/conf/vits.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
# This config contains the default values for training VITS model on LJSpeech dataset.
# If you want to train model on other dataset, you can change config values according to your dataset.
# Most dataset-specific arguments are in the head of the config file, see below.

# TODO: remove unnecessary arguments, refactoring

name: VITS

train_dataset: ???
validation_datasets: ???
sup_data_path: null
sup_data_types: null

phoneme_dict_path: "scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt"
heteronyms_path: "scripts/tts_dataset_files/heteronyms-052722"
whitelist_path: "nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv"

# Default values from librosa.pyin
pitch_fmin: 65.40639132514966
pitch_fmax: 2093.004522404789

sample_rate: 22050
n_mel_channels: 80
n_window_size: 1024
n_window_stride: 256
n_fft: 1024
lowfreq: 0
highfreq: null
window: hann

model:
pitch_fmin: ${pitch_fmin}
pitch_fmax: ${pitch_fmax}

sample_rate: ${sample_rate}
n_mel_channels: ${n_mel_channels}
n_window_size: ${n_window_size}
n_window_stride: ${n_window_stride}
n_fft: ${n_fft}
lowfreq: ${lowfreq}
highfreq: ${highfreq}
window: ${window}
mel_fmin: 0.0
mel_fmax: null

n_speakers: 0
segment_size: 8192
c_mel: 45
c_kl: 1.
use_spectral_norm: false

text_normalizer:
_target_: nemo_text_processing.text_normalization.normalize.Normalizer
lang: en
input_case: cased
whitelist: ${whitelist_path}

text_normalizer_call_kwargs:
verbose: false
punct_pre_process: true
punct_post_process: true

text_tokenizer:
_target_: nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer
punct: true
apostrophe: true
pad_with_space: false
g2p:
_target_: nemo_text_processing.g2p.modules.IPAG2P
phoneme_dict: ${phoneme_dict_path}
heteronyms: ${heteronyms_path}
phoneme_probability: 0.8
# Relies on the heteronyms list for anything that needs to be disambiguated
ignore_ambiguous_words: false
use_chars: true
use_stresses: true

train_ds:
dataset:
_target_: "nemo.collections.tts.torch.data.TTSDataset"
manifest_filepath: ${train_dataset}
sample_rate: ${model.sample_rate}
sup_data_path: ${sup_data_path}
sup_data_types: ${sup_data_types}
n_fft: ${model.n_fft}
win_length: ${model.n_window_size}
hop_length: ${model.n_window_stride}
window: ${model.window}
n_mels: ${model.n_mel_channels}
lowfreq: ${model.lowfreq}
highfreq: ${model.highfreq}
max_duration: null
min_duration: 0.1
ignore_file: null
trim: False
pitch_fmin: ${model.pitch_fmin}
pitch_fmax: ${model.pitch_fmax}

dataloader_params:
num_workers: 8
pin_memory: false

batch_sampler:
batch_size: 32
boundaries: [32,300,400,500,600,700,800,900,1000]
num_replicas: ${trainer.devices}
shuffle: true

validation_ds:
dataset:
_target_: "nemo.collections.tts.torch.data.TTSDataset"
manifest_filepath: ${validation_datasets}
sample_rate: ${model.sample_rate}
sup_data_path: ${sup_data_path}
sup_data_types: ${sup_data_types}
n_fft: ${model.n_fft}
win_length: ${model.n_window_size}
hop_length: ${model.n_window_stride}
window: ${model.window}
n_mels: ${model.n_mel_channels}
lowfreq: ${model.lowfreq}
highfreq: ${model.highfreq}
max_duration: null
min_duration: 0.1
ignore_file: null
trim: False
pitch_fmin: ${model.pitch_fmin}
pitch_fmax: ${model.pitch_fmax}

dataloader_params:
drop_last: false
shuffle: false
batch_size: 16
num_workers: 4
pin_memory: false

preprocessor:
_target_: nemo.collections.asr.parts.preprocessing.features.FilterbankFeatures
nfilt: ${model.n_mel_channels}
highfreq: ${model.highfreq}
log: true
log_zero_guard_type: clamp
log_zero_guard_value: 1e-05
lowfreq: ${model.lowfreq}
n_fft: ${model.n_fft}
n_window_size: ${model.n_window_size}
n_window_stride: ${model.n_window_stride}
pad_to: 1
pad_value: 0
sample_rate: ${model.sample_rate}
window: ${model.window}
normalize: null
preemph: null
dither: 0.0
frame_splicing: 1
stft_conv: false
nb_augmentation_prob : 0
mag_power: 1.0
exact_pad: true
use_grads: true

synthesizer:
_target_: nemo.collections.tts.modules.vits_modules.SynthesizerTrn
inter_channels: 192
hidden_channels: 192
filter_channels: 768
n_heads: 2
n_layers: 6
kernel_size: 3
p_dropout: 0.1
resblock: "1"
resblock_kernel_sizes: [3,7,11]
resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]]
upsample_rates: [8,8,2,2]
upsample_initial_channel: 512
upsample_kernel_sizes: [16,16,4,4]
n_speakers: ${model.n_speakers}
gin_channels: 256 # for multi-speaker

optim:
_target_: torch.optim.AdamW
lr: 2e-4
betas: [0.9, 0.99]
eps: 1e-9

sched:
name: ExponentialLR
lr_decay: 0.999875

trainer:
num_nodes: 1
devices: 2
accelerator: gpu
strategy: ddp
precision: 32
# amp_backend: 'apex'
# amp_level: 'O2'
# benchmark: true
max_epochs: -1
accumulate_grad_batches: 1
enable_checkpointing: false # Provided by exp_manager
logger: false # Provided by exp_manager
log_every_n_steps: 50
check_val_every_n_epoch: 1

exp_manager:
exp_dir: ???
name: ${name}
create_tensorboard_logger: true
create_checkpoint_callback: true
checkpoint_callback_params:
monitor: loss_gen_all
mode: min
resume_if_exists: false
resume_ignore_no_checkpoint: false
Loading

0 comments on commit 5feeca2

Please sign in to comment.