Update ASR scripts for tokenizer building and tarred dataset building #2381

titu1994 · 2021-06-22T00:59:28Z

Changelog

Update docker container to nemo:1.0.1
Add Citrinet 1024 Gamma 0.25 model card for Mandarin to CTC char models
Update tokenizer scripts to support adding bos, eos and pad tokens to SentencePiece tokenizers via --spe_bos, --spe_eos and --spe_pad flags.
Update dataset building script to always provide a max length when building tarred datasets.

Signed-off-by: smajumdar [email protected]

Signed-off-by: smajumdar <[email protected]>

jbalam-nv

LGTM

ericharper

LGTM. Thanks!

* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * style Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * update notebook branch to main Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]>

* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * [BUGFIX] Megatron in NMT was setting vocab_file to None (#2417) * make vocab_file configurable for megatron in nmt Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * Link updates in docs and notebooks and typo fix (#2416) * typo fix for notebooks Signed-off-by: fayejf <[email protected]> * tiny typo fix in docs Signed-off-by: fayejf <[email protected]> * docs branch->stable Signed-off-by: fayejf <[email protected]> * more docs branch -> stable Signed-off-by: fayejf <[email protected]> * tutorial links branch -> stable Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * add renamed 06 Signed-off-by: fayejf <[email protected]> * more fixes Signed-off-by: fayejf <[email protected]> * Update onnx (#2420) Signed-off-by: smajumdar <[email protected]> * Correct version of onnxruntime (#2422) Signed-off-by: smajumdar <[email protected]> * update deployment instructions (#2430) Signed-off-by: ericharper <[email protected]> * Bumping version to 1.1.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> * update jenksinfile Signed-off-by: ericharper <[email protected]> * add upper bounds Signed-off-by: ericharper <[email protected]> * update readme Signed-off-by: ericharper <[email protected]> * update requirements Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * update version Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Add notebook with recommendations for 8 kHz speech (#2326) * Added a notebook with best practices for telephony speech * Added datasets detaiils * Added training recommendations * Emptied out cells with results * Added tutorial to docs Signed-off-by: jbalam <[email protected]> * Addressed review comments Signed-off-by: jbalam <[email protected]> * Added a line to note original sampling rate of an4 Signed-off-by: jbalam <[email protected]> * Made changes suggested in review Signed-off-by: jbalam <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Add FastEmit support for RNNT Losses (#2374) * Temp commit Signed-off-by: smajumdar <[email protected]> * Initial code for fastemit forward pass Signed-off-by: smajumdar <[email protected]> * Correct return reg value Signed-off-by: smajumdar <[email protected]> * Initial cpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Correct few impl Signed-off-by: smajumdar <[email protected]> * Update fastemit scaling Signed-off-by: smajumdar <[email protected]> * Cleanup fastemit Signed-off-by: smajumdar <[email protected]> * Finalize FastEmit regularization PR Signed-off-by: smajumdar <[email protected]> * Refactor code to support fastemit regularization Signed-off-by: smajumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Implement inference functions of TN models Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * fix bugs in hifigan code (#2392) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Update setup.py (#2394) Signed-off-by: Jason <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * update checkpointing (#2396) Signed-off-by: Jason <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * byt5 unicode implementation (#2365) * Audio Norm (#2285) * add jenkins test, refactoring Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix new test Signed-off-by: ekmb <[email protected]> * add serial to the default normalizer, add tests Signed-off-by: ekmb <[email protected]> * manifest test added Signed-off-by: ekmb <[email protected]> * expose more params, new test cases Signed-off-by: ekmb <[email protected]> * fix jenkins, serial clean, exclude range from cardinal Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * addressed review comments Signed-off-by: ekmb <[email protected]> * fix decimal in measure Signed-off-by: ekmb <[email protected]> * move serial in cardinal Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * update for SH zero -> oh Signed-off-by: ekmb <[email protected]> * change n_tagger default Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add check for numba regardless of device Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * upper bound for webdataset Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Correct Dockerfile Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update README (#2332) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * ddp translate GPU allocation fix (#2312) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * ddp translate GPU allocation fix Signed-off-by: AlexGrinch <[email protected]> * map_location instead of set_device Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Shallow fusion (#2315) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * shallow fusion init commit Signed-off-by: AlexGrinch <[email protected]> * debug info removed Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * [BUGFIX] Add upper bound to hydra for 1.0.x (#2337) * upper bound hydra Signed-off-by: ericharper <[email protected]> * upper bound hydra Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update version number Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update package version Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * sparrowhawk tests + punctuation post processing for pynini TN (#2320) * add jenkins test, refactoring Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix new test Signed-off-by: ekmb <[email protected]> * add serial to the default normalizer, add tests Signed-off-by: ekmb <[email protected]> * manifest test added Signed-off-by: ekmb <[email protected]> * expose more params, new test cases Signed-off-by: ekmb <[email protected]> * fix jenkins, serial clean, exclude range from cardinal Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * addressed review comments Signed-off-by: ekmb <[email protected]> * fix decimal in measure Signed-off-by: ekmb <[email protected]> * move serial in cardinal Signed-off-by: ekmb <[email protected]> * sh tests init Signed-off-by: ekmb <[email protected]> * sparrowhawk container tests support added Signed-off-by: ekmb <[email protected]> * add post process to normalize.py, update tests Signed-off-by: ekmb <[email protected]> * remove duplication Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update notebooks to 1.0.2 release (#2338) Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update ranges for omegaconf and hydra (#2336) * Update ranges Signed-off-by: smajumdar <[email protected]> * Updates for Hydra and OmegaConf updates Signed-off-by: smajumdar <[email protected]> * Style fixes Signed-off-by: smajumdar <[email protected]> * Correct tests and revert patch for model utils Signed-off-by: smajumdar <[email protected]> * Correct docstring Signed-off-by: smajumdar <[email protected]> * Revert unnecessary change Signed-off-by: smajumdar <[email protected]> * Revert unnecessary change Signed-off-by: smajumdar <[email protected]> * Guard scheduler for None Signed-off-by: smajumdar <[email protected]> * default to 0.0 if bpe_dropout is None Signed-off-by: ericharper <[email protected]> * Correctly log class that was restored Signed-off-by: smajumdar <[email protected]> * Root patch *bpe_dropout Signed-off-by: smajumdar <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update FastPitch Export (#2355) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * byt5 unicode implementation, first cut Signed-off-by: mchrzanowski <[email protected]> * add bytelevel tokenizer Signed-off-by: mchrzanowski <[email protected]> * update out_dir to not collide (#2358) Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update container version to 21.05 (#2309) * Update container version Signed-off-by: smajumdar <[email protected]> * Temporarily change export format of waveglow Signed-off-by: smajumdar <[email protected]> * Add conda update for numba Signed-off-by: smajumdar <[email protected]> * Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests Signed-off-by: smajumdar <[email protected]> * Correct order of numba minimum verion, remove wrong flag from test Signed-off-by: smajumdar <[email protected]> * Double test of cuda numba Signed-off-by: smajumdar <[email protected]> * Double test of cuda numba Signed-off-by: smajumdar <[email protected]> * Enable RNNT tests Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Text Normalization Update (#2356) * upper cased date support Signed-off-by: ekmb <[email protected]> * update whitelist, change roman weights Signed-off-by: ekmb <[email protected]> * docstrings, space fix, init file Signed-off-by: ekmb <[email protected]> * lgtm Signed-off-by: ekmb <[email protected]> * fraction with measure class Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * address comment Signed-off-by: mchrzanowski <[email protected]> * Add ASR CTC tutorial on fine-tuning on another language (#2346) * Add ASR CTC Language finetuning notebook Signed-off-by: smajumdar <[email protected]> * Add to documentation Signed-off-by: smajumdar <[email protected]> * Improve documentation Signed-off-by: smajumdar <[email protected]> * Correct name of the dataset Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Correct colab link to notebook (#2366) Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * sgdqa update data directories for testing (#2323) * sgdqa update data directories for testing Signed-off-by: Yang Zhang <[email protected]> * fix syntax Signed-off-by: Yang Zhang <[email protected]> * check if data dir exists Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * adding pretrained model Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Added documentation for export() (#2330) * Added export document Signed-off-by: Boris Fomitchev <[email protected]> * Addressed review comments Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update Citrinet model card info (#2369) * Update model card info Signed-off-by: smajumdar <[email protected]> * Cleanup Docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * [NMT] Model Parallel Megatron Encoders (#2238) * add megatron encoder Signed-off-by: ericharper <[email protected]> * added megatron to get_nmt_tokenizer Signed-off-by: ericharper <[email protected]> * add vocab_size and hidden_size to megatron bert Signed-off-by: ericharper <[email protected]> * add megatron encoder module Signed-off-by: ericharper <[email protected]> * fixed horrible typo Signed-off-by: ericharper <[email protected]> * fix typo and add default Signed-off-by: ericharper <[email protected]> * updating nlp overrides for mp nmt Signed-off-by: ericharper <[email protected]> * move some logic back to nlpmodel from overrides Signed-off-by: ericharper <[email protected]> * add checkpoint_file property Signed-off-by: ericharper <[email protected]> * fix property Signed-off-by: ericharper <[email protected]> * num_tokentypes=0 Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * find_unused_parameters=True Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * get instead of pop Signed-off-by: ericharper <[email protected]> * remove token type ids from megatron input example Signed-off-by: ericharper <[email protected]> * pop vocab_size Signed-off-by: ericharper <[email protected]> * fix checkpointing for model parallel Signed-off-by: ericharper <[email protected]> * fix bug in non model parallel Signed-off-by: ericharper <[email protected]> * convert cfg.trainer to dict Signed-off-by: ericharper <[email protected]> * make num_tokentypes configurable for nmt Signed-off-by: ericharper <[email protected]> * update checkpoint_file when using named megatron model in nemo Signed-off-by: ericharper <[email protected]> * make vocab_file configurable Signed-off-by: ericharper <[email protected]> * dataclass can't have mutable default Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * unused imports Signed-off-by: ericharper <[email protected]> * revert input example Signed-off-by: ericharper <[email protected]> * check that checkpoint version is not None Signed-off-by: ericharper <[email protected]> * add mp jenkins test Signed-off-by: ericharper <[email protected]> * update docstring Signed-off-by: ericharper <[email protected]> * add docs for pretrained encoders with nemo nmt Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add notebook with recommendations for 8 kHz speech (#2326) * Added a notebook with best practices for telephony speech * Added datasets detaiils * Added training recommendations * Emptied out cells with results * Added tutorial to docs Signed-off-by: jbalam <[email protected]> * Addressed review comments Signed-off-by: jbalam <[email protected]> * Added a line to note original sampling rate of an4 Signed-off-by: jbalam <[email protected]> * Made changes suggested in review Signed-off-by: jbalam <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add FastEmit support for RNNT Losses (#2374) * Temp commit Signed-off-by: smajumdar <[email protected]> * Initial code for fastemit forward pass Signed-off-by: smajumdar <[email protected]> * Correct return reg value Signed-off-by: smajumdar <[email protected]> * Initial cpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Correct few impl Signed-off-by: smajumdar <[email protected]> * Update fastemit scaling Signed-off-by: smajumdar <[email protected]> * Cleanup fastemit Signed-off-by: smajumdar <[email protected]> * Finalize FastEmit regularization PR Signed-off-by: smajumdar <[email protected]> * Refactor code to support fastemit regularization Signed-off-by: smajumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * byt5 unicode implementation, first cut Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * add bytelevel tokenizer Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update styling Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * avoid circular import Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * fix bugs in hifigan code (#2392) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update setup.py (#2394) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update bytelevel_tokenizer.py Signed-off-by: mchrzanowski <[email protected]> * Update bytelevel_tokenizer.py Signed-off-by: mchrzanowski <[email protected]> * typo Signed-off-by: mchrzanowski <[email protected]> * missed one Signed-off-by: mchrzanowski <[email protected]> * bug fixes Signed-off-by: mchrzanowski <[email protected]> * style fix Signed-off-by: mchrzanowski <[email protected]> * bytelevelprocessor is now generic. Signed-off-by: mchrzanowski <[email protected]> * style fix Signed-off-by: mchrzanowski <[email protected]> * update checkpointing (#2396) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * style Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * woops, didnt merge jenkinsfile the right way * add newline Signed-off-by: mchrzanowski <[email protected]> * undo changes to enja processor Signed-off-by: mchrzanowski <[email protected]> * processor selection decision fix Signed-off-by: mchrzanowski <[email protected]> * newline fix Signed-off-by: mchrzanowski <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: mchrzanowski <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Jagadeesh Balam <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Minor Fixes Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationTestDataset and testing/evaluation code Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationTaggerDataset and training code for tagger Signed-off-by: Tuan Lai <[email protected]> * Restore from local nemo ckpts Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationDecoderDataset Signed-off-by: Tuan Lai <[email protected]> * Add interactive mode for neural_text_normalization_test.py Signed-off-by: Tuan Lai <[email protected]> * Add options to do training or not for tagger/decoder Signed-off-by: Tuan Lai <[email protected]> * Renamed Signed-off-by: Tuan Lai <[email protected]> * Implemented setup dataloader for decoder Signed-off-by: Tuan Lai <[email protected]> * Implemented training and validation for decoder Signed-off-by: Tuan Lai <[email protected]> * Data augmentation for decoder training Signed-off-by: Tuan Lai <[email protected]> * Config change Signed-off-by: Tuan Lai <[email protected]> * add blossom-ci.yml (#2401) Signed-off-by: ericharper <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Merge r1.1 bugfixes into main (#2407) * Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * style Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * update notebook branch to main Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Remove unused imports Signed-off-by: Tuan Lai <[email protected]> * Add initial doc for text_normalization Signed-off-by: Tuan Lai <[email protected]> * Fixed imports warnings Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Renamed Signed-off-by: Tuan Lai <[email protected]> * Allowed duplex modes Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Add docs for duplex_text_normalization_train and duplex_text_normalization_test Signed-off-by: Tuan Lai <[email protected]> * docstrings for model codes + minor fix Signed-off-by: Tuan Lai <[email protected]> * Add more comments and doc strings Signed-off-by: Tuan Lai <[email protected]> * Add doc for datasets + Use time.perf_counter() Signed-off-by: Tuan Lai <[email protected]> * Add code for preprocessing Google TN data Signed-off-by: Tuan Lai <[email protected]> * Add more docs and comments + Minor Fixes Signed-off-by: Tuan Lai <[email protected]> * Add more licenses + Fixed comments + Minors Signed-off-by: Tuan Lai <[email protected]> * Moved evaluation logic to DuplexTextNormalizationModel Signed-off-by: Tuan Lai <[email protected]> * Add logging errors Signed-off-by: Tuan Lai <[email protected]> * Updated validation code of tagger + Minors Signed-off-by: Tuan Lai <[email protected]> * Also write tag preds to log file Signed-off-by: Tuan Lai <[email protected]> * Add data augmentation for tagger dataset Signed-off-by: Tuan Lai <[email protected]> * Added experimental decorators Signed-off-by: Tuan Lai <[email protected]> * Updated docs Signed-off-by: Tuan Lai <[email protected]> * Updated duplex_tn_config.yaml Signed-off-by: Tuan Lai <[email protected]> * Compute token precision of tagger using NeMo metrics Signed-off-by: Tuan Lai <[email protected]> * Fixed saving issue when using ddp accelerator Signed-off-by: Tuan Lai <[email protected]> * Refactoring Signed-off-by: Tuan Lai <[email protected]> * Add option to keep punctuations in TextNormalizationTestDataset Signed-off-by: Tuan Lai <[email protected]> * Changes to input preprocessing + decoder's postprocessing Signed-off-by: Tuan Lai <[email protected]> * Fixed styles + Add references Signed-off-by: Tuan Lai <[email protected]> * Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py Signed-off-by: Tuan Lai <[email protected]> Co-authored-by: Jagadeesh Balam <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: Mike Chrzanowski <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: mchrzanowski <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]>

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (NVIDIA#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (NVIDIA#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (NVIDIA#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (NVIDIA#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (NVIDIA#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * style Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * update notebook branch to main Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]>

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (NVIDIA#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (NVIDIA#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (NVIDIA#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (NVIDIA#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (NVIDIA#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * [BUGFIX] Megatron in NMT was setting vocab_file to None (NVIDIA#2417) * make vocab_file configurable for megatron in nmt Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * Link updates in docs and notebooks and typo fix (NVIDIA#2416) * typo fix for notebooks Signed-off-by: fayejf <[email protected]> * tiny typo fix in docs Signed-off-by: fayejf <[email protected]> * docs branch->stable Signed-off-by: fayejf <[email protected]> * more docs branch -> stable Signed-off-by: fayejf <[email protected]> * tutorial links branch -> stable Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * add renamed 06 Signed-off-by: fayejf <[email protected]> * more fixes Signed-off-by: fayejf <[email protected]> * Update onnx (NVIDIA#2420) Signed-off-by: smajumdar <[email protected]> * Correct version of onnxruntime (NVIDIA#2422) Signed-off-by: smajumdar <[email protected]> * update deployment instructions (NVIDIA#2430) Signed-off-by: ericharper <[email protected]> * Bumping version to 1.1.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> * update jenksinfile Signed-off-by: ericharper <[email protected]> * add upper bounds Signed-off-by: ericharper <[email protected]> * update readme Signed-off-by: ericharper <[email protected]> * update requirements Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * update version Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (NVIDIA#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (NVIDIA#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (NVIDIA#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (NVIDIA#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (NVIDIA#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * [BUGFIX] Megatron in NMT was setting vocab_file to None (NVIDIA#2417) * make vocab_file configurable for megatron in nmt Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * Link updates in docs and notebooks and typo fix (NVIDIA#2416) * typo fix for notebooks Signed-off-by: fayejf <[email protected]> * tiny typo fix in docs Signed-off-by: fayejf <[email protected]> * docs branch->stable Signed-off-by: fayejf <[email protected]> * more docs branch -> stable Signed-off-by: fayejf <[email protected]> * tutorial links branch -> stable Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * add renamed 06 Signed-off-by: fayejf <[email protected]> * more fixes Signed-off-by: fayejf <[email protected]> * Update onnx (NVIDIA#2420) Signed-off-by: smajumdar <[email protected]> * Correct version of onnxruntime (NVIDIA#2422) Signed-off-by: smajumdar <[email protected]> * update deployment instructions (NVIDIA#2430) Signed-off-by: ericharper <[email protected]> * Bumping version to 1.1.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> * update jenksinfile Signed-off-by: ericharper <[email protected]> * add upper bounds Signed-off-by: ericharper <[email protected]> * update readme Signed-off-by: ericharper <[email protected]> * update requirements Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * update version Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Ghasem Pasandi <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326) * Added a notebook with best practices for telephony speech * Added datasets detaiils * Added training recommendations * Emptied out cells with results * Added tutorial to docs Signed-off-by: jbalam <[email protected]> * Addressed review comments Signed-off-by: jbalam <[email protected]> * Added a line to note original sampling rate of an4 Signed-off-by: jbalam <[email protected]> * Made changes suggested in review Signed-off-by: jbalam <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Add FastEmit support for RNNT Losses (NVIDIA#2374) * Temp commit Signed-off-by: smajumdar <[email protected]> * Initial code for fastemit forward pass Signed-off-by: smajumdar <[email protected]> * Correct return reg value Signed-off-by: smajumdar <[email protected]> * Initial cpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Correct few impl Signed-off-by: smajumdar <[email protected]> * Update fastemit scaling Signed-off-by: smajumdar <[email protected]> * Cleanup fastemit Signed-off-by: smajumdar <[email protected]> * Finalize FastEmit regularization PR Signed-off-by: smajumdar <[email protected]> * Refactor code to support fastemit regularization Signed-off-by: smajumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Implement inference functions of TN models Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * fix bugs in hifigan code (NVIDIA#2392) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Update setup.py (NVIDIA#2394) Signed-off-by: Jason <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * update checkpointing (NVIDIA#2396) Signed-off-by: Jason <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * byt5 unicode implementation (NVIDIA#2365) * Audio Norm (NVIDIA#2285) * add jenkins test, refactoring Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix new test Signed-off-by: ekmb <[email protected]> * add serial to the default normalizer, add tests Signed-off-by: ekmb <[email protected]> * manifest test added Signed-off-by: ekmb <[email protected]> * expose more params, new test cases Signed-off-by: ekmb <[email protected]> * fix jenkins, serial clean, exclude range from cardinal Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * addressed review comments Signed-off-by: ekmb <[email protected]> * fix decimal in measure Signed-off-by: ekmb <[email protected]> * move serial in cardinal Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * update for SH zero -> oh Signed-off-by: ekmb <[email protected]> * change n_tagger default Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add check for numba regardless of device Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * upper bound for webdataset Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Correct Dockerfile Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update README (NVIDIA#2332) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * ddp translate GPU allocation fix (NVIDIA#2312) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * ddp translate GPU allocation fix Signed-off-by: AlexGrinch <[email protected]> * map_location instead of set_device Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Shallow fusion (NVIDIA#2315) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * shallow fusion init commit Signed-off-by: AlexGrinch <[email protected]> * debug info removed Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337) * upper bound hydra Signed-off-by: ericharper <[email protected]> * upper bound hydra Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update version number Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update package version Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320) * add jenkins test, refactoring Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix new test Signed-off-by: ekmb <[email protected]> * add serial to the default normalizer, add tests Signed-off-by: ekmb <[email protected]> * manifest test added Signed-off-by: ekmb <[email protected]> * expose more params, new test cases Signed-off-by: ekmb <[email protected]> * fix jenkins, serial clean, exclude range from cardinal Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * addressed review comments Signed-off-by: ekmb <[email protected]> * fix decimal in measure Signed-off-by: ekmb <[email protected]> * move serial in cardinal Signed-off-by: ekmb <[email protected]> * sh tests init Signed-off-by: ekmb <[email protected]> * sparrowhawk container tests support added Signed-off-by: ekmb <[email protected]> * add post process to normalize.py, update tests Signed-off-by: ekmb <[email protected]> * remove duplication Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update notebooks to 1.0.2 release (NVIDIA#2338) Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update ranges for omegaconf and hydra (NVIDIA#2336) * Update ranges Signed-off-by: smajumdar <[email protected]> * Updates for Hydra and OmegaConf updates Signed-off-by: smajumdar <[email protected]> * Style fixes Signed-off-by: smajumdar <[email protected]> * Correct tests and revert patch for model utils Signed-off-by: smajumdar <[email protected]> * Correct docstring Signed-off-by: smajumdar <[email protected]> * Revert unnecessary change Signed-off-by: smajumdar <[email protected]> * Revert unnecessary change Signed-off-by: smajumdar <[email protected]> * Guard scheduler for None Signed-off-by: smajumdar <[email protected]> * default to 0.0 if bpe_dropout is None Signed-off-by: ericharper <[email protected]> * Correctly log class that was restored Signed-off-by: smajumdar <[email protected]> * Root patch *bpe_dropout Signed-off-by: smajumdar <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update FastPitch Export (NVIDIA#2355) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * byt5 unicode implementation, first cut Signed-off-by: mchrzanowski <[email protected]> * add bytelevel tokenizer Signed-off-by: mchrzanowski <[email protected]> * update out_dir to not collide (NVIDIA#2358) Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update container version to 21.05 (NVIDIA#2309) * Update container version Signed-off-by: smajumdar <[email protected]> * Temporarily change export format of waveglow Signed-off-by: smajumdar <[email protected]> * Add conda update for numba Signed-off-by: smajumdar <[email protected]> * Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests Signed-off-by: smajumdar <[email protected]> * Correct order of numba minimum verion, remove wrong flag from test Signed-off-by: smajumdar <[email protected]> * Double test of cuda numba Signed-off-by: smajumdar <[email protected]> * Double test of cuda numba Signed-off-by: smajumdar <[email protected]> * Enable RNNT tests Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Text Normalization Update (NVIDIA#2356) * upper cased date support Signed-off-by: ekmb <[email protected]> * update whitelist, change roman weights Signed-off-by: ekmb <[email protected]> * docstrings, space fix, init file Signed-off-by: ekmb <[email protected]> * lgtm Signed-off-by: ekmb <[email protected]> * fraction with measure class Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * address comment Signed-off-by: mchrzanowski <[email protected]> * Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346) * Add ASR CTC Language finetuning notebook Signed-off-by: smajumdar <[email protected]> * Add to documentation Signed-off-by: smajumdar <[email protected]> * Improve documentation Signed-off-by: smajumdar <[email protected]> * Correct name of the dataset Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Correct colab link to notebook (NVIDIA#2366) Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * sgdqa update data directories for testing (NVIDIA#2323) * sgdqa update data directories for testing Signed-off-by: Yang Zhang <[email protected]> * fix syntax Signed-off-by: Yang Zhang <[email protected]> * check if data dir exists Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * adding pretrained model Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Added documentation for export() (NVIDIA#2330) * Added export document Signed-off-by: Boris Fomitchev <[email protected]> * Addressed review comments Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update Citrinet model card info (NVIDIA#2369) * Update model card info Signed-off-by: smajumdar <[email protected]> * Cleanup Docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * [NMT] Model Parallel Megatron Encoders (NVIDIA#2238) * add megatron encoder Signed-off-by: ericharper <[email protected]> * added megatron to get_nmt_tokenizer Signed-off-by: ericharper <[email protected]> * add vocab_size and hidden_size to megatron bert Signed-off-by: ericharper <[email protected]> * add megatron encoder module Signed-off-by: ericharper <[email protected]> * fixed horrible typo Signed-off-by: ericharper <[email protected]> * fix typo and add default Signed-off-by: ericharper <[email protected]> * updating nlp overrides for mp nmt Signed-off-by: ericharper <[email protected]> * move some logic back to nlpmodel from overrides Signed-off-by: ericharper <[email protected]> * add checkpoint_file property Signed-off-by: ericharper <[email protected]> * fix property Signed-off-by: ericharper <[email protected]> * num_tokentypes=0 Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * find_unused_parameters=True Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * get instead of pop Signed-off-by: ericharper <[email protected]> * remove token type ids from megatron input example Signed-off-by: ericharper <[email protected]> * pop vocab_size Signed-off-by: ericharper <[email protected]> * fix checkpointing for model parallel Signed-off-by: ericharper <[email protected]> * fix bug in non model parallel Signed-off-by: ericharper <[email protected]> * convert cfg.trainer to dict Signed-off-by: ericharper <[email protected]> * make num_tokentypes configurable for nmt Signed-off-by: ericharper <[email protected]> * update checkpoint_file when using named megatron model in nemo Signed-off-by: ericharper <[email protected]> * make vocab_file configurable Signed-off-by: ericharper <[email protected]> * dataclass can't have mutable default Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * unused imports Signed-off-by: ericharper <[email protected]> * revert input example Signed-off-by: ericharper <[email protected]> * check that checkpoint version is not None Signed-off-by: ericharper <[email protected]> * add mp jenkins test Signed-off-by: ericharper <[email protected]> * update docstring Signed-off-by: ericharper <[email protected]> * add docs for pretrained encoders with nemo nmt Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add notebook with recommendations for 8 kHz speech (NVIDIA#2326) * Added a notebook with best practices for telephony speech * Added datasets detaiils * Added training recommendations * Emptied out cells with results * Added tutorial to docs Signed-off-by: jbalam <[email protected]> * Addressed review comments Signed-off-by: jbalam <[email protected]> * Added a line to note original sampling rate of an4 Signed-off-by: jbalam <[email protected]> * Made changes suggested in review Signed-off-by: jbalam <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add FastEmit support for RNNT Losses (NVIDIA#2374) * Temp commit Signed-off-by: smajumdar <[email protected]> * Initial code for fastemit forward pass Signed-off-by: smajumdar <[email protected]> * Correct return reg value Signed-off-by: smajumdar <[email protected]> * Initial cpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Correct few impl Signed-off-by: smajumdar <[email protected]> * Update fastemit scaling Signed-off-by: smajumdar <[email protected]> * Cleanup fastemit Signed-off-by: smajumdar <[email protected]> * Finalize FastEmit regularization PR Signed-off-by: smajumdar <[email protected]> * Refactor code to support fastemit regularization Signed-off-by: smajumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * byt5 unicode implementation, first cut Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * add bytelevel tokenizer Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update styling Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * avoid circular import Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * fix bugs in hifigan code (NVIDIA#2392) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update setup.py (NVIDIA#2394) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update bytelevel_tokenizer.py Signed-off-by: mchrzanowski <[email protected]> * Update bytelevel_tokenizer.py Signed-off-by: mchrzanowski <[email protected]> * typo Signed-off-by: mchrzanowski <[email protected]> * missed one Signed-off-by: mchrzanowski <[email protected]> * bug fixes Signed-off-by: mchrzanowski <[email protected]> * style fix Signed-off-by: mchrzanowski <[email protected]> * bytelevelprocessor is now generic. Signed-off-by: mchrzanowski <[email protected]> * style fix Signed-off-by: mchrzanowski <[email protected]> * update checkpointing (NVIDIA#2396) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * style Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * woops, didnt merge jenkinsfile the right way * add newline Signed-off-by: mchrzanowski <[email protected]> * undo changes to enja processor Signed-off-by: mchrzanowski <[email protected]> * processor selection decision fix Signed-off-by: mchrzanowski <[email protected]> * newline fix Signed-off-by: mchrzanowski <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: mchrzanowski <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Jagadeesh Balam <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Minor Fixes Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationTestDataset and testing/evaluation code Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationTaggerDataset and training code for tagger Signed-off-by: Tuan Lai <[email protected]> * Restore from local nemo ckpts Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationDecoderDataset Signed-off-by: Tuan Lai <[email protected]> * Add interactive mode for neural_text_normalization_test.py Signed-off-by: Tuan Lai <[email protected]> * Add options to do training or not for tagger/decoder Signed-off-by: Tuan Lai <[email protected]> * Renamed Signed-off-by: Tuan Lai <[email protected]> * Implemented setup dataloader for decoder Signed-off-by: Tuan Lai <[email protected]> * Implemented training and validation for decoder Signed-off-by: Tuan Lai <[email protected]> * Data augmentation for decoder training Signed-off-by: Tuan Lai <[email protected]> * Config change Signed-off-by: Tuan Lai <[email protected]> * add blossom-ci.yml (NVIDIA#2401) Signed-off-by: ericharper <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Merge r1.1 bugfixes into main (NVIDIA#2407) * Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (NVIDIA#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (NVIDIA#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (NVIDIA#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (NVIDIA#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (NVIDIA#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * style Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * update notebook branch to main Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Remove unused imports Signed-off-by: Tuan Lai <[email protected]> * Add initial doc for text_normalization Signed-off-by: Tuan Lai <[email protected]> * Fixed imports warnings Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Renamed Signed-off-by: Tuan Lai <[email protected]> * Allowed duplex modes Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Add docs for duplex_text_normalization_train and duplex_text_normalization_test Signed-off-by: Tuan Lai <[email protected]> * docstrings for model codes + minor fix Signed-off-by: Tuan Lai <[email protected]> * Add more comments and doc strings Signed-off-by: Tuan Lai <[email protected]> * Add doc for datasets + Use time.perf_counter() Signed-off-by: Tuan Lai <[email protected]> * Add code for preprocessing Google TN data Signed-off-by: Tuan Lai <[email protected]> * Add more docs and comments + Minor Fixes Signed-off-by: Tuan Lai <[email protected]> * Add more licenses + Fixed comments + Minors Signed-off-by: Tuan Lai <[email protected]> * Moved evaluation logic to DuplexTextNormalizationModel Signed-off-by: Tuan Lai <[email protected]> * Add logging errors Signed-off-by: Tuan Lai <[email protected]> * Updated validation code of tagger + Minors Signed-off-by: Tuan Lai <[email protected]> * Also write tag preds to log file Signed-off-by: Tuan Lai <[email protected]> * Add data augmentation for tagger dataset Signed-off-by: Tuan Lai <[email protected]> * Added experimental decorators Signed-off-by: Tuan Lai <[email protected]> * Updated docs Signed-off-by: Tuan Lai <[email protected]> * Updated duplex_tn_config.yaml Signed-off-by: Tuan Lai <[email protected]> * Compute token precision of tagger using NeMo metrics Signed-off-by: Tuan Lai <[email protected]> * Fixed saving issue when using ddp accelerator Signed-off-by: Tuan Lai <[email protected]> * Refactoring Signed-off-by: Tuan Lai <[email protected]> * Add option to keep punctuations in TextNormalizationTestDataset Signed-off-by: Tuan Lai <[email protected]> * Changes to input preprocessing + decoder's postprocessing Signed-off-by: Tuan Lai <[email protected]> * Fixed styles + Add references Signed-off-by: Tuan Lai <[email protected]> * Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py Signed-off-by: Tuan Lai <[email protected]> Co-authored-by: Jagadeesh Balam <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: Mike Chrzanowski <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: mchrzanowski <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Signed-off-by: Ghasem Pasandi <[email protected]>

* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * [BUGFIX] Megatron in NMT was setting vocab_file to None (#2417) * make vocab_file configurable for megatron in nmt Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * Link updates in docs and notebooks and typo fix (#2416) * typo fix for notebooks Signed-off-by: fayejf <[email protected]> * tiny typo fix in docs Signed-off-by: fayejf <[email protected]> * docs branch->stable Signed-off-by: fayejf <[email protected]> * more docs branch -> stable Signed-off-by: fayejf <[email protected]> * tutorial links branch -> stable Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * add renamed 06 Signed-off-by: fayejf <[email protected]> * more fixes Signed-off-by: fayejf <[email protected]> * Update onnx (#2420) Signed-off-by: smajumdar <[email protected]> * Correct version of onnxruntime (#2422) Signed-off-by: smajumdar <[email protected]> * update deployment instructions (#2430) Signed-off-by: ericharper <[email protected]> * Bumping version to 1.1.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> * update jenksinfile Signed-off-by: ericharper <[email protected]> * add upper bounds Signed-off-by: ericharper <[email protected]> * update readme Signed-off-by: ericharper <[email protected]> * update requirements Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * update version Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Add notebook with recommendations for 8 kHz speech (#2326) * Added a notebook with best practices for telephony speech * Added datasets detaiils * Added training recommendations * Emptied out cells with results * Added tutorial to docs Signed-off-by: jbalam <[email protected]> * Addressed review comments Signed-off-by: jbalam <[email protected]> * Added a line to note original sampling rate of an4 Signed-off-by: jbalam <[email protected]> * Made changes suggested in review Signed-off-by: jbalam <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Add FastEmit support for RNNT Losses (#2374) * Temp commit Signed-off-by: smajumdar <[email protected]> * Initial code for fastemit forward pass Signed-off-by: smajumdar <[email protected]> * Correct return reg value Signed-off-by: smajumdar <[email protected]> * Initial cpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Correct few impl Signed-off-by: smajumdar <[email protected]> * Update fastemit scaling Signed-off-by: smajumdar <[email protected]> * Cleanup fastemit Signed-off-by: smajumdar <[email protected]> * Finalize FastEmit regularization PR Signed-off-by: smajumdar <[email protected]> * Refactor code to support fastemit regularization Signed-off-by: smajumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Implement inference functions of TN models Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * fix bugs in hifigan code (#2392) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Update setup.py (#2394) Signed-off-by: Jason <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * update checkpointing (#2396) Signed-off-by: Jason <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * byt5 unicode implementation (#2365) * Audio Norm (#2285) * add jenkins test, refactoring Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix new test Signed-off-by: ekmb <[email protected]> * add serial to the default normalizer, add tests Signed-off-by: ekmb <[email protected]> * manifest test added Signed-off-by: ekmb <[email protected]> * expose more params, new test cases Signed-off-by: ekmb <[email protected]> * fix jenkins, serial clean, exclude range from cardinal Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * addressed review comments Signed-off-by: ekmb <[email protected]> * fix decimal in measure Signed-off-by: ekmb <[email protected]> * move serial in cardinal Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * update for SH zero -> oh Signed-off-by: ekmb <[email protected]> * change n_tagger default Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add check for numba regardless of device Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * upper bound for webdataset Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Correct Dockerfile Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update README (#2332) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * ddp translate GPU allocation fix (#2312) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * ddp translate GPU allocation fix Signed-off-by: AlexGrinch <[email protected]> * map_location instead of set_device Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Shallow fusion (#2315) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * shallow fusion init commit Signed-off-by: AlexGrinch <[email protected]> * debug info removed Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * [BUGFIX] Add upper bound to hydra for 1.0.x (#2337) * upper bound hydra Signed-off-by: ericharper <[email protected]> * upper bound hydra Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update version number Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update package version Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * sparrowhawk tests + punctuation post processing for pynini TN (#2320) * add jenkins test, refactoring Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix new test Signed-off-by: ekmb <[email protected]> * add serial to the default normalizer, add tests Signed-off-by: ekmb <[email protected]> * manifest test added Signed-off-by: ekmb <[email protected]> * expose more params, new test cases Signed-off-by: ekmb <[email protected]> * fix jenkins, serial clean, exclude range from cardinal Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * addressed review comments Signed-off-by: ekmb <[email protected]> * fix decimal in measure Signed-off-by: ekmb <[email protected]> * move serial in cardinal Signed-off-by: ekmb <[email protected]> * sh tests init Signed-off-by: ekmb <[email protected]> * sparrowhawk container tests support added Signed-off-by: ekmb <[email protected]> * add post process to normalize.py, update tests Signed-off-by: ekmb <[email protected]> * remove duplication Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update notebooks to 1.0.2 release (#2338) Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update ranges for omegaconf and hydra (#2336) * Update ranges Signed-off-by: smajumdar <[email protected]> * Updates for Hydra and OmegaConf updates Signed-off-by: smajumdar <[email protected]> * Style fixes Signed-off-by: smajumdar <[email protected]> * Correct tests and revert patch for model utils Signed-off-by: smajumdar <[email protected]> * Correct docstring Signed-off-by: smajumdar <[email protected]> * Revert unnecessary change Signed-off-by: smajumdar <[email protected]> * Revert unnecessary change Signed-off-by: smajumdar <[email protected]> * Guard scheduler for None Signed-off-by: smajumdar <[email protected]> * default to 0.0 if bpe_dropout is None Signed-off-by: ericharper <[email protected]> * Correctly log class that was restored Signed-off-by: smajumdar <[email protected]> * Root patch *bpe_dropout Signed-off-by: smajumdar <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update FastPitch Export (#2355) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * byt5 unicode implementation, first cut Signed-off-by: mchrzanowski <[email protected]> * add bytelevel tokenizer Signed-off-by: mchrzanowski <[email protected]> * update out_dir to not collide (#2358) Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update container version to 21.05 (#2309) * Update container version Signed-off-by: smajumdar <[email protected]> * Temporarily change export format of waveglow Signed-off-by: smajumdar <[email protected]> * Add conda update for numba Signed-off-by: smajumdar <[email protected]> * Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests Signed-off-by: smajumdar <[email protected]> * Correct order of numba minimum verion, remove wrong flag from test Signed-off-by: smajumdar <[email protected]> * Double test of cuda numba Signed-off-by: smajumdar <[email protected]> * Double test of cuda numba Signed-off-by: smajumdar <[email protected]> * Enable RNNT tests Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Text Normalization Update (#2356) * upper cased date support Signed-off-by: ekmb <[email protected]> * update whitelist, change roman weights Signed-off-by: ekmb <[email protected]> * docstrings, space fix, init file Signed-off-by: ekmb <[email protected]> * lgtm Signed-off-by: ekmb <[email protected]> * fraction with measure class Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * address comment Signed-off-by: mchrzanowski <[email protected]> * Add ASR CTC tutorial on fine-tuning on another language (#2346) * Add ASR CTC Language finetuning notebook Signed-off-by: smajumdar <[email protected]> * Add to documentation Signed-off-by: smajumdar <[email protected]> * Improve documentation Signed-off-by: smajumdar <[email protected]> * Correct name of the dataset Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Correct colab link to notebook (#2366) Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * sgdqa update data directories for testing (#2323) * sgdqa update data directories for testing Signed-off-by: Yang Zhang <[email protected]> * fix syntax Signed-off-by: Yang Zhang <[email protected]> * check if data dir exists Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * adding pretrained model Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Added documentation for export() (#2330) * Added export document Signed-off-by: Boris Fomitchev <[email protected]> * Addressed review comments Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update Citrinet model card info (#2369) * Update model card info Signed-off-by: smajumdar <[email protected]> * Cleanup Docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * [NMT] Model Parallel Megatron Encoders (#2238) * add megatron encoder Signed-off-by: ericharper <[email protected]> * added megatron to get_nmt_tokenizer Signed-off-by: ericharper <[email protected]> * add vocab_size and hidden_size to megatron bert Signed-off-by: ericharper <[email protected]> * add megatron encoder module Signed-off-by: ericharper <[email protected]> * fixed horrible typo Signed-off-by: ericharper <[email protected]> * fix typo and add default Signed-off-by: ericharper <[email protected]> * updating nlp overrides for mp nmt Signed-off-by: ericharper <[email protected]> * move some logic back to nlpmodel from overrides Signed-off-by: ericharper <[email protected]> * add checkpoint_file property Signed-off-by: ericharper <[email protected]> * fix property Signed-off-by: ericharper <[email protected]> * num_tokentypes=0 Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * find_unused_parameters=True Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * get instead of pop Signed-off-by: ericharper <[email protected]> * remove token type ids from megatron input example Signed-off-by: ericharper <[email protected]> * pop vocab_size Signed-off-by: ericharper <[email protected]> * fix checkpointing for model parallel Signed-off-by: ericharper <[email protected]> * fix bug in non model parallel Signed-off-by: ericharper <[email protected]> * convert cfg.trainer to dict Signed-off-by: ericharper <[email protected]> * make num_tokentypes configurable for nmt Signed-off-by: ericharper <[email protected]> * update checkpoint_file when using named megatron model in nemo Signed-off-by: ericharper <[email protected]> * make vocab_file configurable Signed-off-by: ericharper <[email protected]> * dataclass can't have mutable default Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * unused imports Signed-off-by: ericharper <[email protected]> * revert input example Signed-off-by: ericharper <[email protected]> * check that checkpoint version is not None Signed-off-by: ericharper <[email protected]> * add mp jenkins test Signed-off-by: ericharper <[email protected]> * update docstring Signed-off-by: ericharper <[email protected]> * add docs for pretrained encoders with nemo nmt Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add notebook with recommendations for 8 kHz speech (#2326) * Added a notebook with best practices for telephony speech * Added datasets detaiils * Added training recommendations * Emptied out cells with results * Added tutorial to docs Signed-off-by: jbalam <[email protected]> * Addressed review comments Signed-off-by: jbalam <[email protected]> * Added a line to note original sampling rate of an4 Signed-off-by: jbalam <[email protected]> * Made changes suggested in review Signed-off-by: jbalam <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add FastEmit support for RNNT Losses (#2374) * Temp commit Signed-off-by: smajumdar <[email protected]> * Initial code for fastemit forward pass Signed-off-by: smajumdar <[email protected]> * Correct return reg value Signed-off-by: smajumdar <[email protected]> * Initial cpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Correct few impl Signed-off-by: smajumdar <[email protected]> * Update fastemit scaling Signed-off-by: smajumdar <[email protected]> * Cleanup fastemit Signed-off-by: smajumdar <[email protected]> * Finalize FastEmit regularization PR Signed-off-by: smajumdar <[email protected]> * Refactor code to support fastemit regularization Signed-off-by: smajumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * byt5 unicode implementation, first cut Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * add bytelevel tokenizer Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update styling Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * avoid circular import Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * fix bugs in hifigan code (#2392) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update setup.py (#2394) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update bytelevel_tokenizer.py Signed-off-by: mchrzanowski <[email protected]> * Update bytelevel_tokenizer.py Signed-off-by: mchrzanowski <[email protected]> * typo Signed-off-by: mchrzanowski <[email protected]> * missed one Signed-off-by: mchrzanowski <[email protected]> * bug fixes Signed-off-by: mchrzanowski <[email protected]> * style fix Signed-off-by: mchrzanowski <[email protected]> * bytelevelprocessor is now generic. Signed-off-by: mchrzanowski <[email protected]> * style fix Signed-off-by: mchrzanowski <[email protected]> * update checkpointing (#2396) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * style Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * woops, didnt merge jenkinsfile the right way * add newline Signed-off-by: mchrzanowski <[email protected]> * undo changes to enja processor Signed-off-by: mchrzanowski <[email protected]> * processor selection decision fix Signed-off-by: mchrzanowski <[email protected]> * newline fix Signed-off-by: mchrzanowski <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: mchrzanowski <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Jagadeesh Balam <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Minor Fixes Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationTestDataset and testing/evaluation code Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationTaggerDataset and training code for tagger Signed-off-by: Tuan Lai <[email protected]> * Restore from local nemo ckpts Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationDecoderDataset Signed-off-by: Tuan Lai <[email protected]> * Add interactive mode for neural_text_normalization_test.py Signed-off-by: Tuan Lai <[email protected]> * Add options to do training or not for tagger/decoder Signed-off-by: Tuan Lai <[email protected]> * Renamed Signed-off-by: Tuan Lai <[email protected]> * Implemented setup dataloader for decoder Signed-off-by: Tuan Lai <[email protected]> * Implemented training and validation for decoder Signed-off-by: Tuan Lai <[email protected]> * Data augmentation for decoder training Signed-off-by: Tuan Lai <[email protected]> * Config change Signed-off-by: Tuan Lai <[email protected]> * add blossom-ci.yml (#2401) Signed-off-by: ericharper <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Merge r1.1 bugfixes into main (#2407) * Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * style Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * update notebook branch to main Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Remove unused imports Signed-off-by: Tuan Lai <[email protected]> * Add initial doc for text_normalization Signed-off-by: Tuan Lai <[email protected]> * Fixed imports warnings Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Renamed Signed-off-by: Tuan Lai <[email protected]> * Allowed duplex modes Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Add docs for duplex_text_normalization_train and duplex_text_normalization_test Signed-off-by: Tuan Lai <[email protected]> * docstrings for model codes + minor fix Signed-off-by: Tuan Lai <[email protected]> * Add more comments and doc strings Signed-off-by: Tuan Lai <[email protected]> * Add doc for datasets + Use time.perf_counter() Signed-off-by: Tuan Lai <[email protected]> * Add code for preprocessing Google TN data Signed-off-by: Tuan Lai <[email protected]> * Add more docs and comments + Minor Fixes Signed-off-by: Tuan Lai <[email protected]> * Add more licenses + Fixed comments + Minors Signed-off-by: Tuan Lai <[email protected]> * Moved evaluation logic to DuplexTextNormalizationModel Signed-off-by: Tuan Lai <[email protected]> * Add logging errors Signed-off-by: Tuan Lai <[email protected]> * Updated validation code of tagger + Minors Signed-off-by: Tuan Lai <[email protected]> * Also write tag preds to log file Signed-off-by: Tuan Lai <[email protected]> * Add data augmentation for tagger dataset Signed-off-by: Tuan Lai <[email protected]> * Added experimental decorators Signed-off-by: Tuan Lai <[email protected]> * Updated docs Signed-off-by: Tuan Lai <[email protected]> * Updated duplex_tn_config.yaml Signed-off-by: Tuan Lai <[email protected]> * Compute token precision of tagger using NeMo metrics Signed-off-by: Tuan Lai <[email protected]> * Fixed saving issue when using ddp accelerator Signed-off-by: Tuan Lai <[email protected]> * Refactoring Signed-off-by: Tuan Lai <[email protected]> * Add option to keep punctuations in TextNormalizationTestDataset Signed-off-by: Tuan Lai <[email protected]> * Changes to input preprocessing + decoder's postprocessing Signed-off-by: Tuan Lai <[email protected]> * Fixed styles + Add references Signed-off-by: Tuan Lai <[email protected]> * Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py Signed-off-by: Tuan Lai <[email protected]> Co-authored-by: Jagadeesh Balam <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: Mike Chrzanowski <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: mchrzanowski <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]>

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (NVIDIA#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (NVIDIA#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (NVIDIA#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (NVIDIA#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (NVIDIA#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * [BUGFIX] Megatron in NMT was setting vocab_file to None (NVIDIA#2417) * make vocab_file configurable for megatron in nmt Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * Link updates in docs and notebooks and typo fix (NVIDIA#2416) * typo fix for notebooks Signed-off-by: fayejf <[email protected]> * tiny typo fix in docs Signed-off-by: fayejf <[email protected]> * docs branch->stable Signed-off-by: fayejf <[email protected]> * more docs branch -> stable Signed-off-by: fayejf <[email protected]> * tutorial links branch -> stable Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * add renamed 06 Signed-off-by: fayejf <[email protected]> * more fixes Signed-off-by: fayejf <[email protected]> * Update onnx (NVIDIA#2420) Signed-off-by: smajumdar <[email protected]> * Correct version of onnxruntime (NVIDIA#2422) Signed-off-by: smajumdar <[email protected]> * update deployment instructions (NVIDIA#2430) Signed-off-by: ericharper <[email protected]> * Bumping version to 1.1.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> * update jenksinfile Signed-off-by: ericharper <[email protected]> * add upper bounds Signed-off-by: ericharper <[email protected]> * update readme Signed-off-by: ericharper <[email protected]> * update requirements Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * update version Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326) * Added a notebook with best practices for telephony speech * Added datasets detaiils * Added training recommendations * Emptied out cells with results * Added tutorial to docs Signed-off-by: jbalam <[email protected]> * Addressed review comments Signed-off-by: jbalam <[email protected]> * Added a line to note original sampling rate of an4 Signed-off-by: jbalam <[email protected]> * Made changes suggested in review Signed-off-by: jbalam <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Add FastEmit support for RNNT Losses (NVIDIA#2374) * Temp commit Signed-off-by: smajumdar <[email protected]> * Initial code for fastemit forward pass Signed-off-by: smajumdar <[email protected]> * Correct return reg value Signed-off-by: smajumdar <[email protected]> * Initial cpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Correct few impl Signed-off-by: smajumdar <[email protected]> * Update fastemit scaling Signed-off-by: smajumdar <[email protected]> * Cleanup fastemit Signed-off-by: smajumdar <[email protected]> * Finalize FastEmit regularization PR Signed-off-by: smajumdar <[email protected]> * Refactor code to support fastemit regularization Signed-off-by: smajumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Implement inference functions of TN models Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * fix bugs in hifigan code (NVIDIA#2392) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Update setup.py (NVIDIA#2394) Signed-off-by: Jason <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * update checkpointing (NVIDIA#2396) Signed-off-by: Jason <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * byt5 unicode implementation (NVIDIA#2365) * Audio Norm (NVIDIA#2285) * add jenkins test, refactoring Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix new test Signed-off-by: ekmb <[email protected]> * add serial to the default normalizer, add tests Signed-off-by: ekmb <[email protected]> * manifest test added Signed-off-by: ekmb <[email protected]> * expose more params, new test cases Signed-off-by: ekmb <[email protected]> * fix jenkins, serial clean, exclude range from cardinal Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * addressed review comments Signed-off-by: ekmb <[email protected]> * fix decimal in measure Signed-off-by: ekmb <[email protected]> * move serial in cardinal Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * update for SH zero -> oh Signed-off-by: ekmb <[email protected]> * change n_tagger default Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add check for numba regardless of device Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * upper bound for webdataset Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Correct Dockerfile Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update README (NVIDIA#2332) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * ddp translate GPU allocation fix (NVIDIA#2312) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * ddp translate GPU allocation fix Signed-off-by: AlexGrinch <[email protected]> * map_location instead of set_device Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Shallow fusion (NVIDIA#2315) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * shallow fusion init commit Signed-off-by: AlexGrinch <[email protected]> * debug info removed Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337) * upper bound hydra Signed-off-by: ericharper <[email protected]> * upper bound hydra Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update version number Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update package version Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320) * add jenkins test, refactoring Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix new test Signed-off-by: ekmb <[email protected]> * add serial to the default normalizer, add tests Signed-off-by: ekmb <[email protected]> * manifest test added Signed-off-by: ekmb <[email protected]> * expose more params, new test cases Signed-off-by: ekmb <[email protected]> * fix jenkins, serial clean, exclude range from cardinal Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * addressed review comments Signed-off-by: ekmb <[email protected]> * fix decimal in measure Signed-off-by: ekmb <[email protected]> * move serial in cardinal Signed-off-by: ekmb <[email protected]> * sh tests init Signed-off-by: ekmb <[email protected]> * sparrowhawk container tests support added Signed-off-by: ekmb <[email protected]> * add post process to normalize.py, update tests Signed-off-by: ekmb <[email protected]> * remove duplication Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update notebooks to 1.0.2 release (NVIDIA#2338) Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update ranges for omegaconf and hydra (NVIDIA#2336) * Update ranges Signed-off-by: smajumdar <[email protected]> * Updates for Hydra and OmegaConf updates Signed-off-by: smajumdar <[email protected]> * Style fixes Signed-off-by: smajumdar <[email protected]> * Correct tests and revert patch for model utils Signed-off-by: smajumdar <[email protected]> * Correct docstring Signed-off-by: smajumdar <[email protected]> * Revert unnecessary change Signed-off-by: smajumdar <[email protected]> * Revert unnecessary change Signed-off-by: smajumdar <[email protected]> * Guard scheduler for None Signed-off-by: smajumdar <[email protected]> * default to 0.0 if bpe_dropout is None Signed-off-by: ericharper <[email protected]> * Correctly log class that was restored Signed-off-by: smajumdar <[email protected]> * Root patch *bpe_dropout Signed-off-by: smajumdar <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update FastPitch Export (NVIDIA#2355) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * byt5 unicode implementation, first cut Signed-off-by: mchrzanowski <[email protected]> * add bytelevel tokenizer Signed-off-by: mchrzanowski <[email protected]> * update out_dir to not collide (NVIDIA#2358) Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update container version to 21.05 (NVIDIA#2309) * Update container version Signed-off-by: smajumdar <[email protected]> * Temporarily change export format of waveglow Signed-off-by: smajumdar <[email protected]> * Add conda update for numba Signed-off-by: smajumdar <[email protected]> * Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests Signed-off-by: smajumdar <[email protected]> * Correct order of numba minimum verion, remove wrong flag from test Signed-off-by: smajumdar <[email protected]> * Double test of cuda numba Signed-off-by: smajumdar <[email protected]> * Double test of cuda numba Signed-off-by: smajumdar <[email protected]> * Enable RNNT tests Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Text Normalization Update (NVIDIA#2356) * upper cased date support Signed-off-by: ekmb <[email protected]> * update whitelist, change roman weights Signed-off-by: ekmb <[email protected]> * docstrings, space fix, init file Signed-off-by: ekmb <[email protected]> * lgtm Signed-off-by: ekmb <[email protected]> * fraction with measure class Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * address comment Signed-off-by: mchrzanowski <[email protected]> * Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346) * Add ASR CTC Language finetuning notebook Signed-off-by: smajumdar <[email protected]> * Add to documentation Signed-off-by: smajumdar <[email protected]> * Improve documentation Signed-off-by: smajumdar <[email protected]> * Correct name of the dataset Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Correct colab link to notebook (NVIDIA#2366) Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * sgdqa update data directories for testing (NVIDIA#2323) * sgdqa update data directories for testing Signed-off-by: Yang Zhang <[email protected]> * fix syntax Signed-off-by: Yang Zhang <[email protected]> * check if data dir exists Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * adding pretrained model Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Added documentation for export() (NVIDIA#2330) * Added export document Signed-off-by: Boris Fomitchev <[email protected]> * Addressed review comments Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update Citrinet model card info (NVIDIA#2369) * Update model card info Signed-off-by: smajumdar <[email protected]> * Cleanup Docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * [NMT] Model Parallel Megatron Encoders (NVIDIA#2238) * add megatron encoder Signed-off-by: ericharper <[email protected]> * added megatron to get_nmt_tokenizer Signed-off-by: ericharper <[email protected]> * add vocab_size and hidden_size to megatron bert Signed-off-by: ericharper <[email protected]> * add megatron encoder module Signed-off-by: ericharper <[email protected]> * fixed horrible typo Signed-off-by: ericharper <[email protected]> * fix typo and add default Signed-off-by: ericharper <[email protected]> * updating nlp overrides for mp nmt Signed-off-by: ericharper <[email protected]> * move some logic back to nlpmodel from overrides Signed-off-by: ericharper <[email protected]> * add checkpoint_file property Signed-off-by: ericharper <[email protected]> * fix property Signed-off-by: ericharper <[email protected]> * num_tokentypes=0 Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * find_unused_parameters=True Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * get instead of pop Signed-off-by: ericharper <[email protected]> * remove token type ids from megatron input example Signed-off-by: ericharper <[email protected]> * pop vocab_size Signed-off-by: ericharper <[email protected]> * fix checkpointing for model parallel Signed-off-by: ericharper <[email protected]> * fix bug in non model parallel Signed-off-by: ericharper <[email protected]> * convert cfg.trainer to dict Signed-off-by: ericharper <[email protected]> * make num_tokentypes configurable for nmt Signed-off-by: ericharper <[email protected]> * update checkpoint_file when using named megatron model in nemo Signed-off-by: ericharper <[email protected]> * make vocab_file configurable Signed-off-by: ericharper <[email protected]> * dataclass can't have mutable default Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * unused imports Signed-off-by: ericharper <[email protected]> * revert input example Signed-off-by: ericharper <[email protected]> * check that checkpoint version is not None Signed-off-by: ericharper <[email protected]> * add mp jenkins test Signed-off-by: ericharper <[email protected]> * update docstring Signed-off-by: ericharper <[email protected]> * add docs for pretrained encoders with nemo nmt Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add notebook with recommendations for 8 kHz speech (NVIDIA#2326) * Added a notebook with best practices for telephony speech * Added datasets detaiils * Added training recommendations * Emptied out cells with results * Added tutorial to docs Signed-off-by: jbalam <[email protected]> * Addressed review comments Signed-off-by: jbalam <[email protected]> * Added a line to note original sampling rate of an4 Signed-off-by: jbalam <[email protected]> * Made changes suggested in review Signed-off-by: jbalam <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add FastEmit support for RNNT Losses (NVIDIA#2374) * Temp commit Signed-off-by: smajumdar <[email protected]> * Initial code for fastemit forward pass Signed-off-by: smajumdar <[email protected]> * Correct return reg value Signed-off-by: smajumdar <[email protected]> * Initial cpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Correct few impl Signed-off-by: smajumdar <[email protected]> * Update fastemit scaling Signed-off-by: smajumdar <[email protected]> * Cleanup fastemit Signed-off-by: smajumdar <[email protected]> * Finalize FastEmit regularization PR Signed-off-by: smajumdar <[email protected]> * Refactor code to support fastemit regularization Signed-off-by: smajumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * byt5 unicode implementation, first cut Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * add bytelevel tokenizer Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update styling Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * avoid circular import Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * fix bugs in hifigan code (NVIDIA#2392) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update setup.py (NVIDIA#2394) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update bytelevel_tokenizer.py Signed-off-by: mchrzanowski <[email protected]> * Update bytelevel_tokenizer.py Signed-off-by: mchrzanowski <[email protected]> * typo Signed-off-by: mchrzanowski <[email protected]> * missed one Signed-off-by: mchrzanowski <[email protected]> * bug fixes Signed-off-by: mchrzanowski <[email protected]> * style fix Signed-off-by: mchrzanowski <[email protected]> * bytelevelprocessor is now generic. Signed-off-by: mchrzanowski <[email protected]> * style fix Signed-off-by: mchrzanowski <[email protected]> * update checkpointing (NVIDIA#2396) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * style Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * woops, didnt merge jenkinsfile the right way * add newline Signed-off-by: mchrzanowski <[email protected]> * undo changes to enja processor Signed-off-by: mchrzanowski <[email protected]> * processor selection decision fix Signed-off-by: mchrzanowski <[email protected]> * newline fix Signed-off-by: mchrzanowski <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: mchrzanowski <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Jagadeesh Balam <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Minor Fixes Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationTestDataset and testing/evaluation code Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationTaggerDataset and training code for tagger Signed-off-by: Tuan Lai <[email protected]> * Restore from local nemo ckpts Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationDecoderDataset Signed-off-by: Tuan Lai <[email protected]> * Add interactive mode for neural_text_normalization_test.py Signed-off-by: Tuan Lai <[email protected]> * Add options to do training or not for tagger/decoder Signed-off-by: Tuan Lai <[email protected]> * Renamed Signed-off-by: Tuan Lai <[email protected]> * Implemented setup dataloader for decoder Signed-off-by: Tuan Lai <[email protected]> * Implemented training and validation for decoder Signed-off-by: Tuan Lai <[email protected]> * Data augmentation for decoder training Signed-off-by: Tuan Lai <[email protected]> * Config change Signed-off-by: Tuan Lai <[email protected]> * add blossom-ci.yml (NVIDIA#2401) Signed-off-by: ericharper <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Merge r1.1 bugfixes into main (NVIDIA#2407) * Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (NVIDIA#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (NVIDIA#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (NVIDIA#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (NVIDIA#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (NVIDIA#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * style Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * update notebook branch to main Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Remove unused imports Signed-off-by: Tuan Lai <[email protected]> * Add initial doc for text_normalization Signed-off-by: Tuan Lai <[email protected]> * Fixed imports warnings Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Renamed Signed-off-by: Tuan Lai <[email protected]> * Allowed duplex modes Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Add docs for duplex_text_normalization_train and duplex_text_normalization_test Signed-off-by: Tuan Lai <[email protected]> * docstrings for model codes + minor fix Signed-off-by: Tuan Lai <[email protected]> * Add more comments and doc strings Signed-off-by: Tuan Lai <[email protected]> * Add doc for datasets + Use time.perf_counter() Signed-off-by: Tuan Lai <[email protected]> * Add code for preprocessing Google TN data Signed-off-by: Tuan Lai <[email protected]> * Add more docs and comments + Minor Fixes Signed-off-by: Tuan Lai <[email protected]> * Add more licenses + Fixed comments + Minors Signed-off-by: Tuan Lai <[email protected]> * Moved evaluation logic to DuplexTextNormalizationModel Signed-off-by: Tuan Lai <[email protected]> * Add logging errors Signed-off-by: Tuan Lai <[email protected]> * Updated validation code of tagger + Minors Signed-off-by: Tuan Lai <[email protected]> * Also write tag preds to log file Signed-off-by: Tuan Lai <[email protected]> * Add data augmentation for tagger dataset Signed-off-by: Tuan Lai <[email protected]> * Added experimental decorators Signed-off-by: Tuan Lai <[email protected]> * Updated docs Signed-off-by: Tuan Lai <[email protected]> * Updated duplex_tn_config.yaml Signed-off-by: Tuan Lai <[email protected]> * Compute token precision of tagger using NeMo metrics Signed-off-by: Tuan Lai <[email protected]> * Fixed saving issue when using ddp accelerator Signed-off-by: Tuan Lai <[email protected]> * Refactoring Signed-off-by: Tuan Lai <[email protected]> * Add option to keep punctuations in TextNormalizationTestDataset Signed-off-by: Tuan Lai <[email protected]> * Changes to input preprocessing + decoder's postprocessing Signed-off-by: Tuan Lai <[email protected]> * Fixed styles + Add references Signed-off-by: Tuan Lai <[email protected]> * Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py Signed-off-by: Tuan Lai <[email protected]> Co-authored-by: Jagadeesh Balam <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: Mike Chrzanowski <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: mchrzanowski <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]>

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (NVIDIA#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (NVIDIA#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (NVIDIA#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (NVIDIA#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (NVIDIA#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * style Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * update notebook branch to main Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Signed-off-by: Paarth Neekhara <[email protected]>

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (NVIDIA#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (NVIDIA#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (NVIDIA#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (NVIDIA#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (NVIDIA#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * [BUGFIX] Megatron in NMT was setting vocab_file to None (NVIDIA#2417) * make vocab_file configurable for megatron in nmt Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * Link updates in docs and notebooks and typo fix (NVIDIA#2416) * typo fix for notebooks Signed-off-by: fayejf <[email protected]> * tiny typo fix in docs Signed-off-by: fayejf <[email protected]> * docs branch->stable Signed-off-by: fayejf <[email protected]> * more docs branch -> stable Signed-off-by: fayejf <[email protected]> * tutorial links branch -> stable Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * add renamed 06 Signed-off-by: fayejf <[email protected]> * more fixes Signed-off-by: fayejf <[email protected]> * Update onnx (NVIDIA#2420) Signed-off-by: smajumdar <[email protected]> * Correct version of onnxruntime (NVIDIA#2422) Signed-off-by: smajumdar <[email protected]> * update deployment instructions (NVIDIA#2430) Signed-off-by: ericharper <[email protected]> * Bumping version to 1.1.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> * update jenksinfile Signed-off-by: ericharper <[email protected]> * add upper bounds Signed-off-by: ericharper <[email protected]> * update readme Signed-off-by: ericharper <[email protected]> * update requirements Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * update version Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Paarth Neekhara <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326) * Added a notebook with best practices for telephony speech * Added datasets detaiils * Added training recommendations * Emptied out cells with results * Added tutorial to docs Signed-off-by: jbalam <[email protected]> * Addressed review comments Signed-off-by: jbalam <[email protected]> * Added a line to note original sampling rate of an4 Signed-off-by: jbalam <[email protected]> * Made changes suggested in review Signed-off-by: jbalam <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Add FastEmit support for RNNT Losses (NVIDIA#2374) * Temp commit Signed-off-by: smajumdar <[email protected]> * Initial code for fastemit forward pass Signed-off-by: smajumdar <[email protected]> * Correct return reg value Signed-off-by: smajumdar <[email protected]> * Initial cpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Correct few impl Signed-off-by: smajumdar <[email protected]> * Update fastemit scaling Signed-off-by: smajumdar <[email protected]> * Cleanup fastemit Signed-off-by: smajumdar <[email protected]> * Finalize FastEmit regularization PR Signed-off-by: smajumdar <[email protected]> * Refactor code to support fastemit regularization Signed-off-by: smajumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Implement inference functions of TN models Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * fix bugs in hifigan code (NVIDIA#2392) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Update setup.py (NVIDIA#2394) Signed-off-by: Jason <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * update checkpointing (NVIDIA#2396) Signed-off-by: Jason <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * byt5 unicode implementation (NVIDIA#2365) * Audio Norm (NVIDIA#2285) * add jenkins test, refactoring Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix new test Signed-off-by: ekmb <[email protected]> * add serial to the default normalizer, add tests Signed-off-by: ekmb <[email protected]> * manifest test added Signed-off-by: ekmb <[email protected]> * expose more params, new test cases Signed-off-by: ekmb <[email protected]> * fix jenkins, serial clean, exclude range from cardinal Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * addressed review comments Signed-off-by: ekmb <[email protected]> * fix decimal in measure Signed-off-by: ekmb <[email protected]> * move serial in cardinal Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * update for SH zero -> oh Signed-off-by: ekmb <[email protected]> * change n_tagger default Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add check for numba regardless of device Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * upper bound for webdataset Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Correct Dockerfile Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update README (NVIDIA#2332) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * ddp translate GPU allocation fix (NVIDIA#2312) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * ddp translate GPU allocation fix Signed-off-by: AlexGrinch <[email protected]> * map_location instead of set_device Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Shallow fusion (NVIDIA#2315) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * shallow fusion init commit Signed-off-by: AlexGrinch <[email protected]> * debug info removed Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337) * upper bound hydra Signed-off-by: ericharper <[email protected]> * upper bound hydra Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update version number Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update package version Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320) * add jenkins test, refactoring Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix new test Signed-off-by: ekmb <[email protected]> * add serial to the default normalizer, add tests Signed-off-by: ekmb <[email protected]> * manifest test added Signed-off-by: ekmb <[email protected]> * expose more params, new test cases Signed-off-by: ekmb <[email protected]> * fix jenkins, serial clean, exclude range from cardinal Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins dollar sign format Signed-off-by: ekmb <[email protected]> * addressed review comments Signed-off-by: ekmb <[email protected]> * fix decimal in measure Signed-off-by: ekmb <[email protected]> * move serial in cardinal Signed-off-by: ekmb <[email protected]> * sh tests init Signed-off-by: ekmb <[email protected]> * sparrowhawk container tests support added Signed-off-by: ekmb <[email protected]> * add post process to normalize.py, update tests Signed-off-by: ekmb <[email protected]> * remove duplication Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update notebooks to 1.0.2 release (NVIDIA#2338) Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update ranges for omegaconf and hydra (NVIDIA#2336) * Update ranges Signed-off-by: smajumdar <[email protected]> * Updates for Hydra and OmegaConf updates Signed-off-by: smajumdar <[email protected]> * Style fixes Signed-off-by: smajumdar <[email protected]> * Correct tests and revert patch for model utils Signed-off-by: smajumdar <[email protected]> * Correct docstring Signed-off-by: smajumdar <[email protected]> * Revert unnecessary change Signed-off-by: smajumdar <[email protected]> * Revert unnecessary change Signed-off-by: smajumdar <[email protected]> * Guard scheduler for None Signed-off-by: smajumdar <[email protected]> * default to 0.0 if bpe_dropout is None Signed-off-by: ericharper <[email protected]> * Correctly log class that was restored Signed-off-by: smajumdar <[email protected]> * Root patch *bpe_dropout Signed-off-by: smajumdar <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update FastPitch Export (NVIDIA#2355) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * byt5 unicode implementation, first cut Signed-off-by: mchrzanowski <[email protected]> * add bytelevel tokenizer Signed-off-by: mchrzanowski <[email protected]> * update out_dir to not collide (NVIDIA#2358) Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update container version to 21.05 (NVIDIA#2309) * Update container version Signed-off-by: smajumdar <[email protected]> * Temporarily change export format of waveglow Signed-off-by: smajumdar <[email protected]> * Add conda update for numba Signed-off-by: smajumdar <[email protected]> * Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests Signed-off-by: smajumdar <[email protected]> * Correct order of numba minimum verion, remove wrong flag from test Signed-off-by: smajumdar <[email protected]> * Double test of cuda numba Signed-off-by: smajumdar <[email protected]> * Double test of cuda numba Signed-off-by: smajumdar <[email protected]> * Enable RNNT tests Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Text Normalization Update (NVIDIA#2356) * upper cased date support Signed-off-by: ekmb <[email protected]> * update whitelist, change roman weights Signed-off-by: ekmb <[email protected]> * docstrings, space fix, init file Signed-off-by: ekmb <[email protected]> * lgtm Signed-off-by: ekmb <[email protected]> * fraction with measure class Signed-off-by: ekmb <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * address comment Signed-off-by: mchrzanowski <[email protected]> * Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346) * Add ASR CTC Language finetuning notebook Signed-off-by: smajumdar <[email protected]> * Add to documentation Signed-off-by: smajumdar <[email protected]> * Improve documentation Signed-off-by: smajumdar <[email protected]> * Correct name of the dataset Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Correct colab link to notebook (NVIDIA#2366) Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * sgdqa update data directories for testing (NVIDIA#2323) * sgdqa update data directories for testing Signed-off-by: Yang Zhang <[email protected]> * fix syntax Signed-off-by: Yang Zhang <[email protected]> * check if data dir exists Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * adding pretrained model Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Added documentation for export() (NVIDIA#2330) * Added export document Signed-off-by: Boris Fomitchev <[email protected]> * Addressed review comments Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update Citrinet model card info (NVIDIA#2369) * Update model card info Signed-off-by: smajumdar <[email protected]> * Cleanup Docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * [NMT] Model Parallel Megatron Encoders (NVIDIA#2238) * add megatron encoder Signed-off-by: ericharper <[email protected]> * added megatron to get_nmt_tokenizer Signed-off-by: ericharper <[email protected]> * add vocab_size and hidden_size to megatron bert Signed-off-by: ericharper <[email protected]> * add megatron encoder module Signed-off-by: ericharper <[email protected]> * fixed horrible typo Signed-off-by: ericharper <[email protected]> * fix typo and add default Signed-off-by: ericharper <[email protected]> * updating nlp overrides for mp nmt Signed-off-by: ericharper <[email protected]> * move some logic back to nlpmodel from overrides Signed-off-by: ericharper <[email protected]> * add checkpoint_file property Signed-off-by: ericharper <[email protected]> * fix property Signed-off-by: ericharper <[email protected]> * num_tokentypes=0 Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * find_unused_parameters=True Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * get instead of pop Signed-off-by: ericharper <[email protected]> * remove token type ids from megatron input example Signed-off-by: ericharper <[email protected]> * pop vocab_size Signed-off-by: ericharper <[email protected]> * fix checkpointing for model parallel Signed-off-by: ericharper <[email protected]> * fix bug in non model parallel Signed-off-by: ericharper <[email protected]> * convert cfg.trainer to dict Signed-off-by: ericharper <[email protected]> * make num_tokentypes configurable for nmt Signed-off-by: ericharper <[email protected]> * update checkpoint_file when using named megatron model in nemo Signed-off-by: ericharper <[email protected]> * make vocab_file configurable Signed-off-by: ericharper <[email protected]> * dataclass can't have mutable default Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * unused imports Signed-off-by: ericharper <[email protected]> * revert input example Signed-off-by: ericharper <[email protected]> * check that checkpoint version is not None Signed-off-by: ericharper <[email protected]> * add mp jenkins test Signed-off-by: ericharper <[email protected]> * update docstring Signed-off-by: ericharper <[email protected]> * add docs for pretrained encoders with nemo nmt Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add notebook with recommendations for 8 kHz speech (NVIDIA#2326) * Added a notebook with best practices for telephony speech * Added datasets detaiils * Added training recommendations * Emptied out cells with results * Added tutorial to docs Signed-off-by: jbalam <[email protected]> * Addressed review comments Signed-off-by: jbalam <[email protected]> * Added a line to note original sampling rate of an4 Signed-off-by: jbalam <[email protected]> * Made changes suggested in review Signed-off-by: jbalam <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Add FastEmit support for RNNT Losses (NVIDIA#2374) * Temp commit Signed-off-by: smajumdar <[email protected]> * Initial code for fastemit forward pass Signed-off-by: smajumdar <[email protected]> * Correct return reg value Signed-off-by: smajumdar <[email protected]> * Initial cpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Try gpu impl Signed-off-by: smajumdar <[email protected]> * Correct few impl Signed-off-by: smajumdar <[email protected]> * Update fastemit scaling Signed-off-by: smajumdar <[email protected]> * Cleanup fastemit Signed-off-by: smajumdar <[email protected]> * Finalize FastEmit regularization PR Signed-off-by: smajumdar <[email protected]> * Refactor code to support fastemit regularization Signed-off-by: smajumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * byt5 unicode implementation, first cut Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * add bytelevel tokenizer Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * update styling Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * avoid circular import Signed-off-by: Mike Chrzanowski <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * fix bugs in hifigan code (NVIDIA#2392) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update setup.py (NVIDIA#2394) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * Update bytelevel_tokenizer.py Signed-off-by: mchrzanowski <[email protected]> * Update bytelevel_tokenizer.py Signed-off-by: mchrzanowski <[email protected]> * typo Signed-off-by: mchrzanowski <[email protected]> * missed one Signed-off-by: mchrzanowski <[email protected]> * bug fixes Signed-off-by: mchrzanowski <[email protected]> * style fix Signed-off-by: mchrzanowski <[email protected]> * bytelevelprocessor is now generic. Signed-off-by: mchrzanowski <[email protected]> * style fix Signed-off-by: mchrzanowski <[email protected]> * update checkpointing (NVIDIA#2396) Signed-off-by: Jason <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * style Signed-off-by: ericharper <[email protected]> Signed-off-by: mchrzanowski <[email protected]> * woops, didnt merge jenkinsfile the right way * add newline Signed-off-by: mchrzanowski <[email protected]> * undo changes to enja processor Signed-off-by: mchrzanowski <[email protected]> * processor selection decision fix Signed-off-by: mchrzanowski <[email protected]> * newline fix Signed-off-by: mchrzanowski <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: mchrzanowski <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Jagadeesh Balam <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Minor Fixes Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationTestDataset and testing/evaluation code Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationTaggerDataset and training code for tagger Signed-off-by: Tuan Lai <[email protected]> * Restore from local nemo ckpts Signed-off-by: Tuan Lai <[email protected]> * Add TextNormalizationDecoderDataset Signed-off-by: Tuan Lai <[email protected]> * Add interactive mode for neural_text_normalization_test.py Signed-off-by: Tuan Lai <[email protected]> * Add options to do training or not for tagger/decoder Signed-off-by: Tuan Lai <[email protected]> * Renamed Signed-off-by: Tuan Lai <[email protected]> * Implemented setup dataloader for decoder Signed-off-by: Tuan Lai <[email protected]> * Implemented training and validation for decoder Signed-off-by: Tuan Lai <[email protected]> * Data augmentation for decoder training Signed-off-by: Tuan Lai <[email protected]> * Config change Signed-off-by: Tuan Lai <[email protected]> * add blossom-ci.yml (NVIDIA#2401) Signed-off-by: ericharper <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Merge r1.1 bugfixes into main (NVIDIA#2407) * Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378) * update branch Signed-off-by: ericharper <[email protected]> * update jenkinsfile Signed-off-by: ericharper <[email protected]> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380) * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * fix property when not using model parallel Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * add debug statement Signed-off-by: ericharper <[email protected]> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <[email protected]> * Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <[email protected]> * Update container Signed-off-by: smajumdar <[email protected]> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <[email protected]> * Update notebook (NVIDIA#2391) Signed-off-by: smajumdar <[email protected]> * ASR Notebooks fix for 1.1.0 (NVIDIA#2395) * nb fix for spring clean Signed-off-by: fayejf <[email protected]> * remove outdated instruction Signed-off-by: fayejf <[email protected]> * Mean normalization (NVIDIA#2397) * norm embeddings Signed-off-by: nithinraok <[email protected]> * move to utils Signed-off-by: nithinraok <[email protected]> * Bugfix adaptive spec augment time masking (NVIDIA#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Revert freq mask guard Signed-off-by: smajumdar <[email protected]> * Remove static time width clamping Signed-off-by: smajumdar <[email protected]> * Correct typos and issues with notebooks (NVIDIA#2402) * Fix Primer notebook Signed-off-by: smajumdar <[email protected]> * Typo Signed-off-by: smajumdar <[email protected]> * remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403) Signed-off-by: Hoo Chang Shin <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> * style Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * update notebook branch to main Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Signed-off-by: Tuan Lai <[email protected]> * Remove unused imports Signed-off-by: Tuan Lai <[email protected]> * Add initial doc for text_normalization Signed-off-by: Tuan Lai <[email protected]> * Fixed imports warnings Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Renamed Signed-off-by: Tuan Lai <[email protected]> * Allowed duplex modes Signed-off-by: Tuan Lai <[email protected]> * Minor Fix Signed-off-by: Tuan Lai <[email protected]> * Add docs for duplex_text_normalization_train and duplex_text_normalization_test Signed-off-by: Tuan Lai <[email protected]> * docstrings for model codes + minor fix Signed-off-by: Tuan Lai <[email protected]> * Add more comments and doc strings Signed-off-by: Tuan Lai <[email protected]> * Add doc for datasets + Use time.perf_counter() Signed-off-by: Tuan Lai <[email protected]> * Add code for preprocessing Google TN data Signed-off-by: Tuan Lai <[email protected]> * Add more docs and comments + Minor Fixes Signed-off-by: Tuan Lai <[email protected]> * Add more licenses + Fixed comments + Minors Signed-off-by: Tuan Lai <[email protected]> * Moved evaluation logic to DuplexTextNormalizationModel Signed-off-by: Tuan Lai <[email protected]> * Add logging errors Signed-off-by: Tuan Lai <[email protected]> * Updated validation code of tagger + Minors Signed-off-by: Tuan Lai <[email protected]> * Also write tag preds to log file Signed-off-by: Tuan Lai <[email protected]> * Add data augmentation for tagger dataset Signed-off-by: Tuan Lai <[email protected]> * Added experimental decorators Signed-off-by: Tuan Lai <[email protected]> * Updated docs Signed-off-by: Tuan Lai <[email protected]> * Updated duplex_tn_config.yaml Signed-off-by: Tuan Lai <[email protected]> * Compute token precision of tagger using NeMo metrics Signed-off-by: Tuan Lai <[email protected]> * Fixed saving issue when using ddp accelerator Signed-off-by: Tuan Lai <[email protected]> * Refactoring Signed-off-by: Tuan Lai <[email protected]> * Add option to keep punctuations in TextNormalizationTestDataset Signed-off-by: Tuan Lai <[email protected]> * Changes to input preprocessing + decoder's postprocessing Signed-off-by: Tuan Lai <[email protected]> * Fixed styles + Add references Signed-off-by: Tuan Lai <[email protected]> * Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py Signed-off-by: Tuan Lai <[email protected]> Co-authored-by: Jagadeesh Balam <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: Mike Chrzanowski <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: mchrzanowski <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Hoo Chang Shin <[email protected]> Signed-off-by: Paarth Neekhara <[email protected]>

titu1994 added 3 commits June 21, 2021 17:56

Update ASR scripts for tokenizer building and tarred dataset building

bcb868a

Signed-off-by: smajumdar <[email protected]>

Update container

7c239a2

Signed-off-by: smajumdar <[email protected]>

Add STT Zh Citrinet 1024 Gamma 0.25 model

a404eca

Signed-off-by: smajumdar <[email protected]>

titu1994 requested a review from jbalam-nv June 22, 2021 01:11

jbalam-nv approved these changes Jun 22, 2021

View reviewed changes

ericharper approved these changes Jun 22, 2021

View reviewed changes

ericharper merged commit 01997d3 into NVIDIA:r1.1.0 Jun 22, 2021

titu1994 deleted the update_scripts branch June 22, 2021 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ASR scripts for tokenizer building and tarred dataset building #2381

Update ASR scripts for tokenizer building and tarred dataset building #2381

titu1994 commented Jun 22, 2021 •

edited

Loading

jbalam-nv left a comment

ericharper left a comment

Update ASR scripts for tokenizer building and tarred dataset building #2381

Update ASR scripts for tokenizer building and tarred dataset building #2381

Conversation

titu1994 commented Jun 22, 2021 • edited Loading

Changelog

jbalam-nv left a comment

Choose a reason for hiding this comment

ericharper left a comment

Choose a reason for hiding this comment

titu1994 commented Jun 22, 2021 •

edited

Loading