[TTS] porting VITS implementation (#5600) · Kipok/NeMo@5feeca2

Commit

[TTS] porting VITS implementation (NVIDIA#5600)

* Disable loss typecheck

* Fix spectrogram lengths

* Remove Precision 16 requirement

* Address lgtm alerts

* clean up unused code

* Address lgtm alerts

* Refactor audio_to_mel_torch method

* Use NeMo FilterBank to get melspec

Todo: set self.fb

* Fix filterbank max frequency to match with original VITS

* Fix filterbank features correct length

* Address lgtm issues

* Remove print statements

* Remove stft_pad_amount

* new structure for tts datasets in script folder

Signed-off-by: Oktai Tatanov <[email protected]>

* remove cmudict downloading

Signed-off-by: Oktai Tatanov <[email protected]>

* rename mixertts dataset, add vocoder dataset

Signed-off-by: Oktai Tatanov <[email protected]>

* add libritts processing

Signed-off-by: Oktai Tatanov <[email protected]>

* update tts dataset and libritts get data

Signed-off-by: Oktai Tatanov <[email protected]>

* fix bugs in vocoder ds

Signed-off-by: Oktai Tatanov <[email protected]>

* add ds

* changed vits yaml

* rm yaml

* fix yaml and model

* Added scaler

* refactored yaml

* managed to run in fp16

* refactoring

Signed-off-by: Oktai Tatanov <[email protected]>

* fix small bugs and add new todos

Signed-off-by: Oktai Tatanov <[email protected]>

* fix optimizers

Signed-off-by: Oktai Tatanov <[email protected]>

* Port Variational Inference with Adversarial Learning (VITS) to NeMo TTS (NVIDIA#6)

* Add vits files

Add vits_losses.py, vits_modules.py and vits.py.

* Move non-vits models to modules

* Add vits.yaml

* Add _loader to vits.py

* Add basic template for vits

* Update vits.yaml with vits parameters

* Remove extra space

* Add top level training script

* Add some variables to vits yaml

* Add forward and training methods

* Fix imports

* Added validation step

* Log training losses

* Update loss calls to use class attributes

* Add VITS to models list

* Fix all imports

* Remove old module calls

* Fix typo in monotonic align import

* Modified validation step

1. reverted to tensorboard
2. validation_step logs audio, mel-spec for batch 0
3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel

* Fix imports for VITS

* Remove old module calls

* Fix typo in monotonic align import

* Modified validation step

1. reverted to tensorboard
2. validation_step logs audio, mel-spec for batch 0
3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel

* Add parameters from original VITS config

* Fix config file

* Fix imports and generate spec from audio

* Fix incorrect dimensions

* Progress update

* Fix loss

* Fix cuda thing

* Fix monotonic align import

* Fix typos in vits.py

* Disable loss typecheck

* Fix spectrogram lengths

* Remove Precision 16 requirement

* Address lgtm alerts

* clean up unused code

* Address lgtm alerts

* Refactor audio_to_mel_torch method

* Use NeMo FilterBank to get melspec

Todo: set self.fb

* Fix filterbank max frequency to match with original VITS

* Fix filterbank features correct length

* Address lgtm issues

* Remove print statements

* Remove stft_pad_amount

Co-authored-by: martynwei <[email protected]>
Co-authored-by: Ryan Hong <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Jason <[email protected]>

* make new commit

Signed-off-by: Jason <[email protected]>

* add copyright headers

Signed-off-by: Jason <[email protected]>

* style

Signed-off-by: Jason <[email protected]>

* rename README

Signed-off-by: Oktai Tatanov <[email protected]>

* fix style without vits_modules

Signed-off-by: Oktai Tatanov <[email protected]>

* add numba code, fix style and add todos

Signed-off-by: Oktai Tatanov <[email protected]>

* small fix

* fix some todos

* added numba mas

* added DDP sampler

* specified versions

* fixed for new librosa version

* added feature loss

* added IPA phonemizer

* refactored IPA g2p

* added vits losses

* some ref

* fix

* added checkpointing

* cp

* cfg

* merged some 1.8.0 fixes

* plt fix

* fix logging

* fix checkpoint loading

* refactored inference

* fp32 run

* update branch

Signed-off-by: ericharper <[email protected]>

* update package info

Signed-off-by: ericharper <[email protected]>

* new exp

* update branch

Signed-off-by: ericharper <[email protected]>

* Restored tests previously disabled for 22.03 base (NVIDIA#4109)

Signed-off-by: Boris Fomitchev <[email protected]>

* add augmentation to label models (NVIDIA#4113)

* add augmentation to label models

Signed-off-by: nithinraok <[email protected]>

* duration fix

Signed-off-by: nithinraok <[email protected]>

* Call register_bert_model after assigning self.bert_model variable (NVIDIA#4116)

Signed-off-by: Ramanathan Arunachalam <[email protected]>

Co-authored-by: Ramanathan Arunachalam <[email protected]>

* Tutorial on ITN with Thutmose tagger and small fixes (NVIDIA#4117)

* 1. Add tutorial. 2. Move a function to fix import in tutorial. 3. Merge multiple spaces into one space in the final output

Signed-off-by: Alexandra Antonova <[email protected]>

* fixes for code review

Signed-off-by: Alexandra Antonova <[email protected]>

* Add tutorial to tutorials.rst

Signed-off-by: Alexandra Antonova <[email protected]>

Co-authored-by: Alexandra Antonova <[email protected]>

* cleaned up TN/ ITN doc (NVIDIA#4119)

* cleaned up TN/ ITN doc

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* Check implicit grad acc in GLUE dataset building (NVIDIA#4123)

* Check implicit grad acc in GLUE dataset building

Signed-off-by: MaximumEntropy <[email protected]>

* Fix jenkins test for GLUE/XNLI

Signed-off-by: MaximumEntropy <[email protected]>

* update the default (NVIDIA#4135)

Signed-off-by: ekmb <[email protected]>

* Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (NVIDIA#4136)

* Fix restoring from checkpoint with label vocab dir

Signed-off-by: PeganovAnton <[email protected]>

* Add tests for various ways to pass label ids to model

Signed-off-by: PeganovAnton <[email protected]>

* Fix typo

Signed-off-by: PeganovAnton <[email protected]>

* Fix typo

Signed-off-by: PeganovAnton <[email protected]>

* Do not create tmp directory

Signed-off-by: PeganovAnton <[email protected]>

* Fix parameter name

Signed-off-by: PeganovAnton <[email protected]>

* finish cherry-pick op

Signed-off-by: PeganovAnton <[email protected]>

* Fix labels errors

Signed-off-by: PeganovAnton <[email protected]>

* Remove duplicate stage

Signed-off-by: PeganovAnton <[email protected]>

* Change target branch

Signed-off-by: PeganovAnton <[email protected]>

* fix typo (NVIDIA#4140)

Signed-off-by: Yang Zhang <[email protected]>

* Fix/punctuation avoid overwritting tmp files (NVIDIA#4144)

* Add draft of fixing tmp files overwritting

Signed-off-by: PeganovAnton <[email protected]>

* Remove accidental changes

Signed-off-by: PeganovAnton <[email protected]>

* Remove accidental changes

Signed-off-by: PeganovAnton <[email protected]>

* Use built-in tempfile library

Signed-off-by: PeganovAnton <[email protected]>

* Fix code style

Signed-off-by: PeganovAnton <[email protected]>

* bug_fix_diarization_manifest_creation (NVIDIA#4125)

Signed-off-by: Yang Zhang <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>

* fix doc (NVIDIA#4146)

Signed-off-by: Yang Zhang <[email protected]>

* Tacotron2 retrain (NVIDIA#4103)

* fix yaml

Signed-off-by: treacker <[email protected]>

* Fix for new TTSDataset class

Signed-off-by: treacker <[email protected]>

* added wandb logging

Signed-off-by: treacker <[email protected]>

* added wandb logging

Signed-off-by: treacker <[email protected]>

* fix numpy version

Signed-off-by: treacker <[email protected]>

* fix numpy version

Signed-off-by: treacker <[email protected]>

* inference fix

Signed-off-by: treacker <[email protected]>

* removed old code

Signed-off-by: treacker <[email protected]>

* updated parser logic

Signed-off-by: treacker <[email protected]>

* reverted version update

Signed-off-by: treacker <[email protected]>

* refactored parser logic

Signed-off-by: treacker <[email protected]>

* Updated Jenkinsfile

Signed-off-by: treacker <[email protected]>

* Refactored tutorial for Tacotron2

Signed-off-by: treacker <[email protected]>

* Made backward compatibility

Signed-off-by: treacker <[email protected]>

* Made backward compatibility

Signed-off-by: treacker <[email protected]>

* Update Jenkinsfile

Signed-off-by: treacker <[email protected]>

* Update tacotron.yaml

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* cleaned up TN/ ITN doc (NVIDIA#4119)

* cleaned up TN/ ITN doc

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: treacker <[email protected]>

* Check implicit grad acc in GLUE dataset building (NVIDIA#4123)

* Check implicit grad acc in GLUE dataset building

Signed-off-by: MaximumEntropy <[email protected]>

* Fix jenkins test for GLUE/XNLI

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Fixed jenkins

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Multiprocess improvements (NVIDIA#4127)

* initial commit

Signed-off-by: nithinraok <[email protected]>

* start fix

Signed-off-by: nithinraok <[email protected]>

* improve multiprocessing speed while creating speaker dataset

Signed-off-by: nithinraok <[email protected]>

* updated scp to filelist

Signed-off-by: nithinraok <[email protected]>

* WaveGlow input type fixes (NVIDIA#4151)

Signed-off-by: Jocelyn Huang <[email protected]>

* notebooks' link, typo and import  fix  (NVIDIA#4158)

* redo missing pr 4007

Signed-off-by: fayejf <[email protected]>

* remove extremely unreliable links

Signed-off-by: fayejf <[email protected]>

* Thutmose tagger bug fixes (NVIDIA#4162)

* add pretrained ngc model, small fixes

Signed-off-by: Alexandra Antonova <[email protected]>

* fix model location

Signed-off-by: Alexandra Antonova <[email protected]>

* fix model location

Signed-off-by: Alexandra Antonova <[email protected]>

* 1. fix typos. 2. write magic functions without space

Signed-off-by: Alexandra Antonova <[email protected]>

* add example of inference with pretrained model

Signed-off-by: Alexandra Antonova <[email protected]>

* changed model location to nemo

Signed-off-by: Alexandra Antonova <[email protected]>

* style fix

Signed-off-by: Alexandra Antonova <[email protected]>

* fix space

Signed-off-by: Alexandra Antonova <[email protected]>

Co-authored-by: Alexandra Antonova <[email protected]>

* update speaker docs (NVIDIA#4164)

* update speaker docs

Signed-off-by: nithinraok <[email protected]>

* chunks -> segments

Signed-off-by: nithinraok <[email protected]>

* Khz -> kHz

Signed-off-by: nithinraok <[email protected]>

* changed to vits g2p

* refactoring

* added cosineLR

* Updated whitelist path

* added vanilla torch grad scaler

* Fixed lightning version

* added warmup and wd

* switched to cosineLR

* refactored data classes for vits

* some fixes

* fixed import

* changeg train loop

* fixed scheduler bug

* refactoring for exps

* Refactored loss logic

* Ref for exps

* added coqui stuff

* exps

* bugfix

* added side file

* bugfix

* reverted

* fixed sampler behaviour

* updated for ptl 1.7.2

* refactored dataloader func

* some cleaning

* reverted to vanilla loss

* modified for pickling

* added dataset class

* fixed torch version

* added autocast for fp training

* removed coqui files

* 'Fixed tokenizer'

* Fix tokenizer

* update branch

Signed-off-by: ericharper <[email protected]>

* Fix link to inference notebook (NVIDIA#5247)

Signed-off-by: Jocelyn Huang <[email protected]>

Signed-off-by: Jocelyn Huang <[email protected]>

* Update ASR scores table (NVIDIA#5254)

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>

* Fix links to speaker identification notebook (NVIDIA#5260)

Signed-off-by: SeanNaren <[email protected]>

Signed-off-by: SeanNaren <[email protected]>

* Minor typo fixes in TTS tutorial (NVIDIA#5266)

Signed-off-by: Jocelyn Huang <[email protected]>

Signed-off-by: Jocelyn Huang <[email protected]>

* Pcla tutorial fixes (NVIDIA#5271)

* Fixed typos

Signed-off-by: Matvei Novikov <[email protected]>

* Fixed cell type and tatoeba reference

Signed-off-by: Matvei Novikov <[email protected]>

* Fixed typo

Signed-off-by: Matvei Novikov <[email protected]>

* Fixed branch variable

Signed-off-by: Matvei Novikov <[email protected]>

Signed-off-by: Matvei Novikov <[email protected]>

* Fix bug into Dialogue tutorial (NVIDIA#5277)

* Typo fix (NVIDIA#5288)

Signed-off-by: Matvei Novikov <[email protected]>

Signed-off-by: Matvei Novikov <[email protected]>

* Fix dialogue tutorial bug (NVIDIA#5297)

* set add_pooling_layer=False for huggingface bert model

* remove add_pooling_layer=False and set find_unused_parameters=True

* set num_prompt_tokens to 0 for huggingface

* small bugfix for r1.13.0 (NVIDIA#5310)

* typo fix

Signed-off-by: fayejf <[email protected]>

* udpate transcribe

Signed-off-by: fayejf <[email protected]>

Signed-off-by: fayejf <[email protected]>

* Add italian model checkpoints (NVIDIA#5316)

Signed-off-by: Igor Gitman <[email protected]>

Signed-off-by: Igor Gitman <[email protected]>

* [STT] Add Ru ASR Conformer-CTC and Conformer-Transducer (NVIDIA#5340)

* [STT] Add stt_ru_conformer_ctc_large

Signed-off-by: Sasha Meister <[email protected]>

* [STT] Add stt_ru_conformer_transducer_large

Add stt_ru_conformer_transducer_large

Signed-off-by: Sasha Meister <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Sasha Meister <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pcla tutorial fixes (NVIDIA#5313)

* fixes

Signed-off-by: Matvei Novikov <[email protected]>

* fixes

Signed-off-by: Matvei Novikov <[email protected]>

* moved `create_text_and_labels` to token_classification_utils.py

Signed-off-by: Matvei Novikov <[email protected]>

Signed-off-by: Matvei Novikov <[email protected]>

* a lot of refactoring

* strict ptl version

* strict ptl version

* reverted plt version

* Added base text2audio class

* Fix issue with HF Model upload tutorial (NVIDIA#5359)

* Add Gradio App to ASR Docs (NVIDIA#5270)

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>
(cherry picked from commit e4b6a38)

* Fix issue with normalized config for dataset name

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>

* tutorial fixes (NVIDIA#5354)

Signed-off-by: Matvei Novikov <[email protected]>

Signed-off-by: Matvei Novikov <[email protected]>

* Add SDP documentation (NVIDIA#5274)

* Add details to SDP README.md

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add docstring to WriteManifest processor

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add docstring to CreateInitialManifestMLS

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add ModifyManifestTextProcessor docstring

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add ASRInference docstring

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add base_processor docstrings

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add minimal SDP docs page

Signed-off-by: Elena Rastorgueva <[email protected]>

* Update tools/speech_dataset_processor/README.md

Co-authored-by: Igor Gitman <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Write simple README for SDP and move complex explanations to docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove incorrect type hints

Signed-off-by: Elena Rastorgueva <[email protected]>

* Make config example less confusing

Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix typo

Signed-off-by: Elena Rastorgueva <[email protected]>

* Clarify that YAML file is config file in README

Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove unused imports

Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove SDP docs for now

Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove links to docs in SDP README

Signed-off-by: Elena Rastorgueva <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Igor Gitman <[email protected]>

* [Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 (NVIDIA#5375)

* Fix minor error in notebook

Signed-off-by: Taejin Park <[email protected]>

* changed branch name in tutorial notebook

Signed-off-by: Taejin Park <[email protected]>

Signed-off-by: Taejin Park <[email protected]>

* Rename Speech Dataset Processor to Speech Data Processor (NVIDIA#5378)

Signed-off-by: Elena Rastorgueva <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix for num worker 0 causing issues in losses after 1 epoch (NVIDIA#5379)

* Fixed bug in notebook (NVIDIA#5382)

Signed-off-by: Virginia Adams <[email protected]>

Signed-off-by: Virginia Adams <[email protected]>

* Force MHA QKV onto fp32 (NVIDIA#5391)

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>

* Added scheduling variety

* ref

* Fix for prompt table restore error (NVIDIA#5393)

* Fix for prompt table restore error

Signed-off-by: Virginia Adams <[email protected]>

* Added more saftey checks

Signed-off-by: Virginia Adams <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added more condition checks

Signed-off-by: Virginia Adams <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Virginia Adams <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (NVIDIA#5410)

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* bugfix

* import tests

* Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421)

Signed-off-by: Yu Yao <[email protected]>

Signed-off-by: Yu Yao <[email protected]>

* Megatron Export Update (NVIDIA#5343)

* export update for Megatron + change ORT optimization

Signed-off-by: David Mosallanezhad <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated export_utils to use autocast instead of manually casting >:/

Signed-off-by: David Mosallanezhad <[email protected]>

* removed dtype from LayerNorm

Signed-off-by: David Mosallanezhad <[email protected]>

* added comment

Signed-off-by: David Mosallanezhad <[email protected]>

* reverting changes on FloatCast

Signed-off-by: David Mosallanezhad <[email protected]>

* Cherry-picked changes from megatron-norm

Signed-off-by: Boris Fomitchev <[email protected]>

* updated asr_model import to cast_utils

Signed-off-by: David Mosallanezhad <[email protected]>

* updated del onnx_model place

Signed-off-by: David Mosallanezhad <[email protected]>

* changed ort optimization to basic -> temp fix

Signed-off-by: David Mosallanezhad <[email protected]>

Signed-off-by: David Mosallanezhad <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <[email protected]>

* disable pc test (NVIDIA#5426)

Signed-off-by: ekmb <[email protected]>

Signed-off-by: ekmb <[email protected]>

* Fix GPT generation when using sentencepiece tokenizer (NVIDIA#5413)

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Disable sync_batch_comm in validation_step for GPT (NVIDIA#5397)

* disable sync_batch_comm in validation_step

Signed-off-by: ericharper <[email protected]>

* Read sync_batch_comm from config or default to False

Signed-off-by: Markel Sanz Ausin <[email protected]>

* Update megatron_gpt_config to default sync_batch_comm to False to avoid CUDA error

Signed-off-by: Markel Sanz Ausin <[email protected]>

* Empty

Signed-off-by: MaximumEntropy <[email protected]>

* Comment out test

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Markel Sanz Ausin <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Markel Sanz Ausin <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421)" (NVIDIA#5431)

This reverts commit 0718b17.

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA#5420)

* Revert workers workaround

Signed-off-by: MaximumEntropy <[email protected]>

* Fix in config

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Fixed discrepancies

* updated Jenkisfile

* updated Jenkisfile

* Cleaning

* fixed the onnx bug in conformer for non-streaming models. (NVIDIA#5242) (NVIDIA#5446)

Signed-off-by: Vahid <[email protected]>

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>

* Set sync_batch_comm in other places (NVIDIA#5448)

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* Radtts 1.13 (NVIDIA#5451)

* [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (NVIDIA#5358)
* [TTS] add CI test for RADTTS training recipe.

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Radtts 1.13 plus (NVIDIA#5457)

* [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (NVIDIA#5358)
* Fixing RADTTS training - removing view buffer and fixing accuracy issue
* Fixes for Torchscript/Triton
* Added autocast to radtts UT
* using cuda() for training example

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Add num layers check (NVIDIA#5470)

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* Change to kwargs (NVIDIA#5475)

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA#5339) (NVIDIA#5478)

* Initial refactor

Signed-off-by: MaximumEntropy <[email protected]>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes for eval

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <[email protected]>

* Refactor

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <[email protected]>

* Remove comments

Signed-off-by: MaximumEntropy <[email protected]>

* Minor

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <[email protected]>

* Remove old comment

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* export_utils bugfix (NVIDIA#5480)

* updated export_utils

Signed-off-by: David Mosallanezhad <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Mosallanezhad <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Export fixes for Riva (NVIDIA#5496)

* Export fixes for Riva

Signed-off-by: Boris Fomitchev <[email protected]>

* Cleaning up training_utils

Signed-off-by: Boris Fomitchev <[email protected]>

Signed-off-by: Boris Fomitchev <[email protected]>

* minor bug fix (NVIDIA#5521)

Signed-off-by: David Mosallanezhad <[email protected]>

Signed-off-by: David Mosallanezhad <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>

* added set_start_method + function param bugfix (NVIDIA#5539)

* added set_start_method + function param bugfix

Signed-off-by: David Mosallanezhad <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* upper bound torchmetrics

Signed-off-by: ericharper <[email protected]>

Signed-off-by: David Mosallanezhad <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ericharper <[email protected]>

* remove notebook (NVIDIA#5548)

Signed-off-by: ericharper <[email protected]>

Signed-off-by: ericharper <[email protected]>

* Remove broadcast (NVIDIA#5558)

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* cleaning

* Fix all gather while writing to a file during T5 finetuning (NVIDIA#5561)

* Gather from data parallel only instead of all ranks

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>

* update readme

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added copyright

* fixed imports

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed filesize check

* last cleaning

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated cmudict path

* fixed merge bug

Signed-off-by: Evgeniy Shabalin <[email protected]>

* warnings fix

* fix warnings

Signed-off-by: Evgeniy Shabalin <[email protected]>

* storing

* updated version

Signed-off-by: Evgeniy Shabalin <[email protected]>

* update Jenkinsfile versions

Signed-off-by: Evgeniy Shabalin <[email protected]>

* fixed issues

Signed-off-by: Evgeniy Shabalin <[email protected]>

* fixed more issues

* more fixes

Signed-off-by: Evgeniy Shabalin <[email protected]>

* added experimental tag

* Clarification updates

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Evgeniy Shabalin <[email protected]>

* remove old cython code

Signed-off-by: Evgeniy Shabalin <[email protected]>

* remove old cython code

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring fix

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Enhancements

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Enhancements

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* imports fix

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Evgeniy Shabalin <[email protected]>

* excessive comtutations fix

Signed-off-by: Evgeniy Shabalin <[email protected]>

* typecheck fix

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Small refactoring

* Small refactoring

Signed-off-by: Evgeniy Shabalin <[email protected]>

* reversed exp_manager params

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Fixed call for new function signature

Signed-off-by: Evgeniy Shabalin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Jason <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: ekmb <[email protected]>
Signed-off-by: PeganovAnton <[email protected]>
Signed-off-by: Jocelyn Huang <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: SeanNaren <[email protected]>
Signed-off-by: Matvei Novikov <[email protected]>
Signed-off-by: Igor Gitman <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Virginia Adams <[email protected]>
Signed-off-by: Yu Yao <[email protected]>
Signed-off-by: David Mosallanezhad <[email protected]>
Signed-off-by: Markel Sanz Ausin <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Vahid <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Evgeniy Shabalin <[email protected]>
Co-authored-by: jasonjjl1999 <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: jasonjjl1999 <[email protected]>
Co-authored-by: martynwei <[email protected]>
Co-authored-by: Ryan Hong <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: ericharper <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Ramanathan Arunachalam <[email protected]>
Co-authored-by: Ramanathan Arunachalam <[email protected]>
Co-authored-by: bene-ges <[email protected]>
Co-authored-by: Alexandra Antonova <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: PeganovAnton <[email protected]>
Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Sean Naren <[email protected]>
Co-authored-by: Matvei Novikov <[email protected]>
Co-authored-by: Zhilin Wang <[email protected]>
Co-authored-by: Igor Gitman <[email protected]>
Co-authored-by: Sasha Meister <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Virginia Adams <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Markel Sanz Ausin <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>

Loading branch information

43 people committed Jan 31, 2023

1 parent f1a6dee commit 5feeca2

Jenkinsfile

-Original file line number
+Diff line change
@@ Expand Up @@
           cleanWs()
         }
       }
-    }
+    }

examples/tts/conf/vits.yaml

-Original file line number
+Diff line change
@@ -0,0 +1,215 @@
+    # This config contains the default values for training VITS model on LJSpeech dataset.
+    # If you want to train model on other dataset, you can change config values according to your dataset.
+    # Most dataset-specific arguments are in the head of the config file, see below.
+    # TODO: remove unnecessary arguments, refactoring
+    name: VITS
+    train_dataset: ???
+    validation_datasets: ???
+    sup_data_path: null
+    sup_data_types: null
+    phoneme_dict_path: "scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt"
+    heteronyms_path: "scripts/tts_dataset_files/heteronyms-052722"
+    whitelist_path: "nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv"
+    # Default values from librosa.pyin
+    pitch_fmin: 65.40639132514966
+    pitch_fmax: 2093.004522404789
+    sample_rate: 22050
+    n_mel_channels: 80
+    n_window_size: 1024
+    n_window_stride: 256
+    n_fft: 1024
+    lowfreq: 0
+    highfreq: null
+    window: hann
+    model:
+      pitch_fmin: ${pitch_fmin}
+      pitch_fmax: ${pitch_fmax}
+      sample_rate: ${sample_rate}
+      n_mel_channels: ${n_mel_channels}
+      n_window_size: ${n_window_size}
+      n_window_stride: ${n_window_stride}
+      n_fft: ${n_fft}
+      lowfreq: ${lowfreq}
+      highfreq: ${highfreq}
+      window: ${window}
+      mel_fmin: 0.0
+      mel_fmax: null
+      n_speakers: 0
+      segment_size: 8192
+      c_mel: 45
+      c_kl: 1.
+      use_spectral_norm: false
+      text_normalizer:
+        _target_: nemo_text_processing.text_normalization.normalize.Normalizer
+        lang: en
+        input_case: cased
+        whitelist: ${whitelist_path}
+      text_normalizer_call_kwargs:
+        verbose: false
+        punct_pre_process: true
+        punct_post_process: true
+      text_tokenizer:
+        _target_: nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer
+        punct: true
+        apostrophe: true
+        pad_with_space: false
+        g2p:
+          _target_: nemo_text_processing.g2p.modules.IPAG2P
+          phoneme_dict: ${phoneme_dict_path}
+          heteronyms: ${heteronyms_path}
+          phoneme_probability: 0.8
+          # Relies on the heteronyms list for anything that needs to be disambiguated
+          ignore_ambiguous_words: false
+          use_chars: true
+          use_stresses: true
+      train_ds:
+        dataset:
+          _target_: "nemo.collections.tts.torch.data.TTSDataset"
+          manifest_filepath: ${train_dataset}
+          sample_rate: ${model.sample_rate}
+          sup_data_path: ${sup_data_path}
+          sup_data_types: ${sup_data_types}
+          n_fft: ${model.n_fft}
+          win_length: ${model.n_window_size}
+          hop_length: ${model.n_window_stride}
+          window: ${model.window}
+          n_mels: ${model.n_mel_channels}
+          lowfreq: ${model.lowfreq}
+          highfreq: ${model.highfreq}
+          max_duration: null
+          min_duration: 0.1
+          ignore_file: null
+          trim: False
+          pitch_fmin: ${model.pitch_fmin}
+          pitch_fmax: ${model.pitch_fmax}
+        dataloader_params:
+          num_workers: 8
+          pin_memory: false
+        batch_sampler:
+          batch_size: 32
+          boundaries: [32,300,400,500,600,700,800,900,1000]
+          num_replicas: ${trainer.devices}
+          shuffle: true
+      validation_ds:
+        dataset:
+          _target_: "nemo.collections.tts.torch.data.TTSDataset"
+          manifest_filepath: ${validation_datasets}
+          sample_rate: ${model.sample_rate}
+          sup_data_path: ${sup_data_path}
+          sup_data_types: ${sup_data_types}
+          n_fft: ${model.n_fft}
+          win_length: ${model.n_window_size}
+          hop_length: ${model.n_window_stride}
+          window: ${model.window}
+          n_mels: ${model.n_mel_channels}
+          lowfreq: ${model.lowfreq}
+          highfreq: ${model.highfreq}
+          max_duration: null
+          min_duration: 0.1
+          ignore_file: null
+          trim: False
+          pitch_fmin: ${model.pitch_fmin}
+          pitch_fmax: ${model.pitch_fmax}
+        dataloader_params:
+          drop_last: false
+          shuffle: false
+          batch_size: 16
+          num_workers: 4
+          pin_memory: false
+      preprocessor:
+        _target_: nemo.collections.asr.parts.preprocessing.features.FilterbankFeatures
+        nfilt: ${model.n_mel_channels}
+        highfreq: ${model.highfreq}
+        log: true
+        log_zero_guard_type: clamp
+        log_zero_guard_value: 1e-05
+        lowfreq: ${model.lowfreq}
+        n_fft: ${model.n_fft}
+        n_window_size: ${model.n_window_size}
+        n_window_stride: ${model.n_window_stride}
+        pad_to: 1
+        pad_value: 0
+        sample_rate: ${model.sample_rate}
+        window: ${model.window}
+        normalize: null
+        preemph: null
+        dither: 0.0
+        frame_splicing: 1
+        stft_conv: false
+        nb_augmentation_prob : 0
+        mag_power: 1.0
+        exact_pad: true
+        use_grads: true
+      synthesizer:
+        _target_: nemo.collections.tts.modules.vits_modules.SynthesizerTrn
+        inter_channels: 192
+        hidden_channels: 192
+        filter_channels: 768
+        n_heads: 2
+        n_layers: 6
+        kernel_size: 3
+        p_dropout: 0.1
+        resblock: "1"
+        resblock_kernel_sizes: [3,7,11]
+        resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]]
+        upsample_rates: [8,8,2,2]
+        upsample_initial_channel: 512
+        upsample_kernel_sizes: [16,16,4,4]
+        n_speakers: ${model.n_speakers}
+        gin_channels: 256 # for multi-speaker
+      optim:
+        _target_: torch.optim.AdamW
+        lr: 2e-4
+        betas: [0.9, 0.99]
+        eps: 1e-9
+        sched:
+          name: ExponentialLR
+          lr_decay: 0.999875
+    trainer:
+      num_nodes: 1
+      devices: 2
+      accelerator: gpu
+      strategy: ddp
+      precision: 32
+      # amp_backend: 'apex'
+      # amp_level: 'O2'
+      # benchmark: true
+      max_epochs: -1
+      accumulate_grad_batches: 1
+      enable_checkpointing: false # Provided by exp_manager
+      logger: false # Provided by exp_manager
+      log_every_n_steps: 50
+      check_val_every_n_epoch: 1
+    exp_manager:
+      exp_dir: ???
+      name: ${name}
+      create_tensorboard_logger: true
+      create_checkpoint_callback: true
+      checkpoint_callback_params:
+        monitor: loss_gen_all
+        mode: min
+      resume_if_exists: false
+      resume_ignore_no_checkpoint: false

0 comments on commit `5feeca2`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `5feeca2`

Commit

There are no files selected for viewing

0 comments on commit 5feeca2

0 comments on commit `5feeca2`