Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ASR scripts for tokenizer building and tarred dataset building #2381

Merged
merged 3 commits into from
Jun 22, 2021

Conversation

titu1994
Copy link
Collaborator

@titu1994 titu1994 commented Jun 22, 2021

Changelog

  • Update docker container to nemo:1.0.1
  • Add Citrinet 1024 Gamma 0.25 model card for Mandarin to CTC char models
  • Update tokenizer scripts to support adding bos, eos and pad tokens to SentencePiece tokenizers via --spe_bos, --spe_eos and --spe_pad flags.
  • Update dataset building script to always provide a max length when building tarred datasets.

Signed-off-by: smajumdar [email protected]

@titu1994 titu1994 requested a review from jbalam-nv June 22, 2021 01:11
Copy link
Collaborator

@jbalam-nv jbalam-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ericharper ericharper merged commit 01997d3 into NVIDIA:r1.1.0 Jun 22, 2021
@titu1994 titu1994 deleted the update_scripts branch June 22, 2021 22:04
ericharper added a commit that referenced this pull request Jun 25, 2021
* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
titu1994 added a commit that referenced this pull request Jul 2, 2021
* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* [BUGFIX] Megatron in NMT was setting vocab_file to None (#2417)

* make vocab_file configurable for megatron in nmt

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* Link updates in docs and notebooks and typo fix (#2416)

* typo fix for notebooks

Signed-off-by: fayejf <[email protected]>

* tiny typo fix in docs

Signed-off-by: fayejf <[email protected]>

* docs branch->stable

Signed-off-by: fayejf <[email protected]>

* more docs branch -> stable

Signed-off-by: fayejf <[email protected]>

* tutorial links branch -> stable

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* add renamed 06

Signed-off-by: fayejf <[email protected]>

* more fixes

Signed-off-by: fayejf <[email protected]>

* Update onnx (#2420)

Signed-off-by: smajumdar <[email protected]>

* Correct version of onnxruntime (#2422)

Signed-off-by: smajumdar <[email protected]>

* update deployment instructions (#2430)

Signed-off-by: ericharper <[email protected]>

* Bumping version to 1.1.0

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* update jenksinfile

Signed-off-by: ericharper <[email protected]>

* add upper bounds

Signed-off-by: ericharper <[email protected]>

* update readme

Signed-off-by: ericharper <[email protected]>

* update requirements

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* update version

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
yzhang123 added a commit that referenced this pull request Jul 8, 2021
* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Update setup.py (#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* update checkpointing (#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* byt5 unicode implementation (#2365)

* Audio Norm (#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fixes

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <[email protected]>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <[email protected]>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <[email protected]>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <[email protected]>

* Config change

Signed-off-by: Tuan Lai <[email protected]>

* add blossom-ci.yml (#2401)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Merge r1.1 bugfixes into main (#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Remove unused imports

Signed-off-by: Tuan Lai <[email protected]>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <[email protected]>

* Fixed imports warnings

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Allowed duplex modes

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <[email protected]>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <[email protected]>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <[email protected]>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <[email protected]>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <[email protected]>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <[email protected]>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <[email protected]>

* Add logging errors
Signed-off-by: Tuan Lai <[email protected]>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <[email protected]>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <[email protected]>

* Added experimental decorators
Signed-off-by: Tuan Lai <[email protected]>

* Updated docs
Signed-off-by: Tuan Lai <[email protected]>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <[email protected]>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <[email protected]>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <[email protected]>

* Refactoring
Signed-off-by: Tuan Lai <[email protected]>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <[email protected]>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <[email protected]>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <[email protected]>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <[email protected]>

Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Mike Chrzanowski <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
mousebaiker pushed a commit to mousebaiker/NeMo that referenced this pull request Jul 8, 2021
* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
mousebaiker pushed a commit to mousebaiker/NeMo that referenced this pull request Jul 8, 2021
* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* [BUGFIX] Megatron in NMT was setting vocab_file to None (NVIDIA#2417)

* make vocab_file configurable for megatron in nmt

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* Link updates in docs and notebooks and typo fix (NVIDIA#2416)

* typo fix for notebooks

Signed-off-by: fayejf <[email protected]>

* tiny typo fix in docs

Signed-off-by: fayejf <[email protected]>

* docs branch->stable

Signed-off-by: fayejf <[email protected]>

* more docs branch -> stable

Signed-off-by: fayejf <[email protected]>

* tutorial links branch -> stable

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* add renamed 06

Signed-off-by: fayejf <[email protected]>

* more fixes

Signed-off-by: fayejf <[email protected]>

* Update onnx (NVIDIA#2420)

Signed-off-by: smajumdar <[email protected]>

* Correct version of onnxruntime (NVIDIA#2422)

Signed-off-by: smajumdar <[email protected]>

* update deployment instructions (NVIDIA#2430)

Signed-off-by: ericharper <[email protected]>

* Bumping version to 1.1.0

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* update jenksinfile

Signed-off-by: ericharper <[email protected]>

* add upper bounds

Signed-off-by: ericharper <[email protected]>

* update readme

Signed-off-by: ericharper <[email protected]>

* update requirements

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* update version

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
pasandi20 pushed a commit to pasandi20/NeMo that referenced this pull request Jul 13, 2021
* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* [BUGFIX] Megatron in NMT was setting vocab_file to None (NVIDIA#2417)

* make vocab_file configurable for megatron in nmt

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* Link updates in docs and notebooks and typo fix (NVIDIA#2416)

* typo fix for notebooks

Signed-off-by: fayejf <[email protected]>

* tiny typo fix in docs

Signed-off-by: fayejf <[email protected]>

* docs branch->stable

Signed-off-by: fayejf <[email protected]>

* more docs branch -> stable

Signed-off-by: fayejf <[email protected]>

* tutorial links branch -> stable

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* add renamed 06

Signed-off-by: fayejf <[email protected]>

* more fixes

Signed-off-by: fayejf <[email protected]>

* Update onnx (NVIDIA#2420)

Signed-off-by: smajumdar <[email protected]>

* Correct version of onnxruntime (NVIDIA#2422)

Signed-off-by: smajumdar <[email protected]>

* update deployment instructions (NVIDIA#2430)

Signed-off-by: ericharper <[email protected]>

* Bumping version to 1.1.0

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* update jenksinfile

Signed-off-by: ericharper <[email protected]>

* add upper bounds

Signed-off-by: ericharper <[email protected]>

* update readme

Signed-off-by: ericharper <[email protected]>

* update requirements

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* update version

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Ghasem Pasandi <[email protected]>
pasandi20 pushed a commit to pasandi20/NeMo that referenced this pull request Jul 13, 2021
* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* byt5 unicode implementation (NVIDIA#2365)

* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fixes

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <[email protected]>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <[email protected]>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <[email protected]>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <[email protected]>

* Config change

Signed-off-by: Tuan Lai <[email protected]>

* add blossom-ci.yml (NVIDIA#2401)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Merge r1.1 bugfixes into main (NVIDIA#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Remove unused imports

Signed-off-by: Tuan Lai <[email protected]>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <[email protected]>

* Fixed imports warnings

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Allowed duplex modes

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <[email protected]>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <[email protected]>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <[email protected]>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <[email protected]>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <[email protected]>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <[email protected]>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <[email protected]>

* Add logging errors
Signed-off-by: Tuan Lai <[email protected]>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <[email protected]>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <[email protected]>

* Added experimental decorators
Signed-off-by: Tuan Lai <[email protected]>

* Updated docs
Signed-off-by: Tuan Lai <[email protected]>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <[email protected]>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <[email protected]>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <[email protected]>

* Refactoring
Signed-off-by: Tuan Lai <[email protected]>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <[email protected]>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <[email protected]>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <[email protected]>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <[email protected]>

Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Mike Chrzanowski <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Ghasem Pasandi <[email protected]>
fayejf added a commit that referenced this pull request Jul 16, 2021
* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* [BUGFIX] Megatron in NMT was setting vocab_file to None (#2417)

* make vocab_file configurable for megatron in nmt

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* Link updates in docs and notebooks and typo fix (#2416)

* typo fix for notebooks

Signed-off-by: fayejf <[email protected]>

* tiny typo fix in docs

Signed-off-by: fayejf <[email protected]>

* docs branch->stable

Signed-off-by: fayejf <[email protected]>

* more docs branch -> stable

Signed-off-by: fayejf <[email protected]>

* tutorial links branch -> stable

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* add renamed 06

Signed-off-by: fayejf <[email protected]>

* more fixes

Signed-off-by: fayejf <[email protected]>

* Update onnx (#2420)

Signed-off-by: smajumdar <[email protected]>

* Correct version of onnxruntime (#2422)

Signed-off-by: smajumdar <[email protected]>

* update deployment instructions (#2430)

Signed-off-by: ericharper <[email protected]>

* Bumping version to 1.1.0

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* update jenksinfile

Signed-off-by: ericharper <[email protected]>

* add upper bounds

Signed-off-by: ericharper <[email protected]>

* update readme

Signed-off-by: ericharper <[email protected]>

* update requirements

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* update version

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
fayejf added a commit that referenced this pull request Jul 16, 2021
* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Update setup.py (#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* update checkpointing (#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* byt5 unicode implementation (#2365)

* Audio Norm (#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fixes

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <[email protected]>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <[email protected]>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <[email protected]>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <[email protected]>

* Config change

Signed-off-by: Tuan Lai <[email protected]>

* add blossom-ci.yml (#2401)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Merge r1.1 bugfixes into main (#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Remove unused imports

Signed-off-by: Tuan Lai <[email protected]>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <[email protected]>

* Fixed imports warnings

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Allowed duplex modes

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <[email protected]>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <[email protected]>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <[email protected]>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <[email protected]>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <[email protected]>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <[email protected]>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <[email protected]>

* Add logging errors
Signed-off-by: Tuan Lai <[email protected]>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <[email protected]>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <[email protected]>

* Added experimental decorators
Signed-off-by: Tuan Lai <[email protected]>

* Updated docs
Signed-off-by: Tuan Lai <[email protected]>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <[email protected]>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <[email protected]>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <[email protected]>

* Refactoring
Signed-off-by: Tuan Lai <[email protected]>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <[email protected]>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <[email protected]>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <[email protected]>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <[email protected]>

Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Mike Chrzanowski <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
titu1994 added a commit to titu1994/NeMo that referenced this pull request Jul 20, 2021
* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* [BUGFIX] Megatron in NMT was setting vocab_file to None (NVIDIA#2417)

* make vocab_file configurable for megatron in nmt

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* Link updates in docs and notebooks and typo fix (NVIDIA#2416)

* typo fix for notebooks

Signed-off-by: fayejf <[email protected]>

* tiny typo fix in docs

Signed-off-by: fayejf <[email protected]>

* docs branch->stable

Signed-off-by: fayejf <[email protected]>

* more docs branch -> stable

Signed-off-by: fayejf <[email protected]>

* tutorial links branch -> stable

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* add renamed 06

Signed-off-by: fayejf <[email protected]>

* more fixes

Signed-off-by: fayejf <[email protected]>

* Update onnx (NVIDIA#2420)

Signed-off-by: smajumdar <[email protected]>

* Correct version of onnxruntime (NVIDIA#2422)

Signed-off-by: smajumdar <[email protected]>

* update deployment instructions (NVIDIA#2430)

Signed-off-by: ericharper <[email protected]>

* Bumping version to 1.1.0

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* update jenksinfile

Signed-off-by: ericharper <[email protected]>

* add upper bounds

Signed-off-by: ericharper <[email protected]>

* update readme

Signed-off-by: ericharper <[email protected]>

* update requirements

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* update version

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
titu1994 added a commit to titu1994/NeMo that referenced this pull request Jul 20, 2021
* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* byt5 unicode implementation (NVIDIA#2365)

* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fixes

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <[email protected]>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <[email protected]>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <[email protected]>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <[email protected]>

* Config change

Signed-off-by: Tuan Lai <[email protected]>

* add blossom-ci.yml (NVIDIA#2401)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Merge r1.1 bugfixes into main (NVIDIA#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Remove unused imports

Signed-off-by: Tuan Lai <[email protected]>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <[email protected]>

* Fixed imports warnings

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Allowed duplex modes

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <[email protected]>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <[email protected]>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <[email protected]>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <[email protected]>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <[email protected]>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <[email protected]>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <[email protected]>

* Add logging errors
Signed-off-by: Tuan Lai <[email protected]>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <[email protected]>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <[email protected]>

* Added experimental decorators
Signed-off-by: Tuan Lai <[email protected]>

* Updated docs
Signed-off-by: Tuan Lai <[email protected]>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <[email protected]>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <[email protected]>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <[email protected]>

* Refactoring
Signed-off-by: Tuan Lai <[email protected]>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <[email protected]>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <[email protected]>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <[email protected]>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <[email protected]>

Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Mike Chrzanowski <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request Sep 17, 2021
* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request Sep 17, 2021
* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* [BUGFIX] Megatron in NMT was setting vocab_file to None (NVIDIA#2417)

* make vocab_file configurable for megatron in nmt

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* Link updates in docs and notebooks and typo fix (NVIDIA#2416)

* typo fix for notebooks

Signed-off-by: fayejf <[email protected]>

* tiny typo fix in docs

Signed-off-by: fayejf <[email protected]>

* docs branch->stable

Signed-off-by: fayejf <[email protected]>

* more docs branch -> stable

Signed-off-by: fayejf <[email protected]>

* tutorial links branch -> stable

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* add renamed 06

Signed-off-by: fayejf <[email protected]>

* more fixes

Signed-off-by: fayejf <[email protected]>

* Update onnx (NVIDIA#2420)

Signed-off-by: smajumdar <[email protected]>

* Correct version of onnxruntime (NVIDIA#2422)

Signed-off-by: smajumdar <[email protected]>

* update deployment instructions (NVIDIA#2430)

Signed-off-by: ericharper <[email protected]>

* Bumping version to 1.1.0

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* update jenksinfile

Signed-off-by: ericharper <[email protected]>

* add upper bounds

Signed-off-by: ericharper <[email protected]>

* update readme

Signed-off-by: ericharper <[email protected]>

* update requirements

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* update version

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request Sep 17, 2021
* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* byt5 unicode implementation (NVIDIA#2365)

* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fixes

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <[email protected]>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <[email protected]>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <[email protected]>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <[email protected]>

* Config change

Signed-off-by: Tuan Lai <[email protected]>

* add blossom-ci.yml (NVIDIA#2401)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Merge r1.1 bugfixes into main (NVIDIA#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Remove unused imports

Signed-off-by: Tuan Lai <[email protected]>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <[email protected]>

* Fixed imports warnings

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Allowed duplex modes

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <[email protected]>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <[email protected]>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <[email protected]>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <[email protected]>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <[email protected]>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <[email protected]>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <[email protected]>

* Add logging errors
Signed-off-by: Tuan Lai <[email protected]>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <[email protected]>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <[email protected]>

* Added experimental decorators
Signed-off-by: Tuan Lai <[email protected]>

* Updated docs
Signed-off-by: Tuan Lai <[email protected]>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <[email protected]>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <[email protected]>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <[email protected]>

* Refactoring
Signed-off-by: Tuan Lai <[email protected]>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <[email protected]>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <[email protected]>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <[email protected]>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <[email protected]>

Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Mike Chrzanowski <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants