Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Normalization Update #2356

Merged
merged 8 commits into from
Jun 15, 2021
Merged

Text Normalization Update #2356

merged 8 commits into from
Jun 15, 2021

Conversation

ekmb
Copy link
Collaborator

@ekmb ekmb commented Jun 14, 2021

  • added support for fractional numbers
  • added support for roman numbers up to 1000 (audio-based normalization only)
  • parallel normalization of manifests
  • bug fixes and pre/post-processing updates to improve normalization coverage

@ekmb ekmb requested a review from yzhang123 June 14, 2021 17:49
@ekmb ekmb linked an issue Jun 14, 2021 that may be closed by this pull request
@lgtm-com
Copy link

lgtm-com bot commented Jun 14, 2021

This pull request introduces 2 alerts when merging 31c220a into fbfdc1b - view on LGTM.com

new alerts:

  • 2 for Unused import

@@ -1,6 +1,10 @@
Ph.D. p h d
Hon. honorable
& and
&Co. and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happends to Co? could you delete this entry?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated and moved this to the alternative list

class RomanFst(GraphFst):
"""
Finite state transducer for verbalizing electronic
e.g. tokens { electronic { username: "cdf1" domain: "abc.edu" } } -> c d f one at a b c dot e d u
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust doc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ekmb ekmb requested a review from yzhang123 June 14, 2021 19:57
@lgtm-com
Copy link

lgtm-com bot commented Jun 14, 2021

This pull request introduces 3 alerts when merging bd37b1e into fbfdc1b - view on LGTM.com

new alerts:

  • 3 for Unused import

Copy link
Contributor

@yzhang123 yzhang123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please fix lgtm?

@ekmb ekmb merged commit 08f3c65 into main Jun 15, 2021
@ekmb
Copy link
Collaborator Author

ekmb commented Jun 15, 2021

could you please fix lgtm?

fixed already

@ekmb ekmb deleted the tn_update branch June 15, 2021 01:26
mchrzanowski pushed a commit that referenced this pull request Jun 23, 2021
* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Mike Chrzanowski <[email protected]>
michalivne pushed a commit to michalivne/NeMo that referenced this pull request Jun 23, 2021
* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
mchrzanowski pushed a commit that referenced this pull request Jun 23, 2021
* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>
ericharper added a commit that referenced this pull request Jun 24, 2021
* Audio Norm (#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
yzhang123 added a commit that referenced this pull request Jul 8, 2021
* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Update setup.py (#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* update checkpointing (#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* byt5 unicode implementation (#2365)

* Audio Norm (#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fixes

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <[email protected]>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <[email protected]>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <[email protected]>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <[email protected]>

* Config change

Signed-off-by: Tuan Lai <[email protected]>

* add blossom-ci.yml (#2401)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Merge r1.1 bugfixes into main (#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Remove unused imports

Signed-off-by: Tuan Lai <[email protected]>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <[email protected]>

* Fixed imports warnings

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Allowed duplex modes

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <[email protected]>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <[email protected]>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <[email protected]>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <[email protected]>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <[email protected]>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <[email protected]>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <[email protected]>

* Add logging errors
Signed-off-by: Tuan Lai <[email protected]>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <[email protected]>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <[email protected]>

* Added experimental decorators
Signed-off-by: Tuan Lai <[email protected]>

* Updated docs
Signed-off-by: Tuan Lai <[email protected]>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <[email protected]>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <[email protected]>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <[email protected]>

* Refactoring
Signed-off-by: Tuan Lai <[email protected]>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <[email protected]>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <[email protected]>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <[email protected]>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <[email protected]>

Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Mike Chrzanowski <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
mousebaiker pushed a commit to mousebaiker/NeMo that referenced this pull request Jul 8, 2021
* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
mousebaiker pushed a commit to mousebaiker/NeMo that referenced this pull request Jul 8, 2021
* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
pasandi20 pushed a commit to pasandi20/NeMo that referenced this pull request Jul 13, 2021
* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* byt5 unicode implementation (NVIDIA#2365)

* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fixes

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <[email protected]>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <[email protected]>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <[email protected]>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <[email protected]>

* Config change

Signed-off-by: Tuan Lai <[email protected]>

* add blossom-ci.yml (NVIDIA#2401)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Merge r1.1 bugfixes into main (NVIDIA#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Remove unused imports

Signed-off-by: Tuan Lai <[email protected]>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <[email protected]>

* Fixed imports warnings

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Allowed duplex modes

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <[email protected]>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <[email protected]>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <[email protected]>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <[email protected]>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <[email protected]>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <[email protected]>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <[email protected]>

* Add logging errors
Signed-off-by: Tuan Lai <[email protected]>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <[email protected]>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <[email protected]>

* Added experimental decorators
Signed-off-by: Tuan Lai <[email protected]>

* Updated docs
Signed-off-by: Tuan Lai <[email protected]>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <[email protected]>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <[email protected]>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <[email protected]>

* Refactoring
Signed-off-by: Tuan Lai <[email protected]>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <[email protected]>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <[email protected]>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <[email protected]>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <[email protected]>

Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Mike Chrzanowski <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Ghasem Pasandi <[email protected]>
fayejf added a commit that referenced this pull request Jul 16, 2021
* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Update setup.py (#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* update checkpointing (#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* byt5 unicode implementation (#2365)

* Audio Norm (#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fixes

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <[email protected]>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <[email protected]>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <[email protected]>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <[email protected]>

* Config change

Signed-off-by: Tuan Lai <[email protected]>

* add blossom-ci.yml (#2401)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Merge r1.1 bugfixes into main (#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Remove unused imports

Signed-off-by: Tuan Lai <[email protected]>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <[email protected]>

* Fixed imports warnings

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Allowed duplex modes

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <[email protected]>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <[email protected]>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <[email protected]>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <[email protected]>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <[email protected]>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <[email protected]>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <[email protected]>

* Add logging errors
Signed-off-by: Tuan Lai <[email protected]>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <[email protected]>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <[email protected]>

* Added experimental decorators
Signed-off-by: Tuan Lai <[email protected]>

* Updated docs
Signed-off-by: Tuan Lai <[email protected]>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <[email protected]>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <[email protected]>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <[email protected]>

* Refactoring
Signed-off-by: Tuan Lai <[email protected]>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <[email protected]>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <[email protected]>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <[email protected]>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <[email protected]>

Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Mike Chrzanowski <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
titu1994 added a commit to titu1994/NeMo that referenced this pull request Jul 20, 2021
* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* byt5 unicode implementation (NVIDIA#2365)

* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fixes

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <[email protected]>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <[email protected]>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <[email protected]>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <[email protected]>

* Config change

Signed-off-by: Tuan Lai <[email protected]>

* add blossom-ci.yml (NVIDIA#2401)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Merge r1.1 bugfixes into main (NVIDIA#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Remove unused imports

Signed-off-by: Tuan Lai <[email protected]>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <[email protected]>

* Fixed imports warnings

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Allowed duplex modes

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <[email protected]>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <[email protected]>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <[email protected]>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <[email protected]>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <[email protected]>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <[email protected]>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <[email protected]>

* Add logging errors
Signed-off-by: Tuan Lai <[email protected]>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <[email protected]>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <[email protected]>

* Added experimental decorators
Signed-off-by: Tuan Lai <[email protected]>

* Updated docs
Signed-off-by: Tuan Lai <[email protected]>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <[email protected]>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <[email protected]>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <[email protected]>

* Refactoring
Signed-off-by: Tuan Lai <[email protected]>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <[email protected]>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <[email protected]>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <[email protected]>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <[email protected]>

Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Mike Chrzanowski <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
MaximumEntropy added a commit that referenced this pull request Aug 11, 2021
* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update README (#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ddp translate GPU allocation fix (#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Shallow fusion (#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update notebooks to 1.0.2 release (#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update ranges for omegaconf and hydra (#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update FastPitch Export (#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update out_dir to not collide (#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update container version to 21.05 (#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Text Normalization Update (#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Correct colab link to notebook (#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* sgdqa update data directories for testing (#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added documentation for export() (#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update Citrinet model card info (#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [NMT] Model Parallel Megatron Encoders (#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Working on bottleneck transformers.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on bottleneck transformers.

* 1. Done cleaning code of bottleneck transformers.
2. Ready to test.

Signed-off-by: Micha Livne <[email protected]>

* 1. Done cleaning code of bottleneck transformers.
2. Ready to test.

* 1. Working on training script.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on training script.

* 1. Updated config class name.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated config class name.

* 1. Training script is ready to be tested.

Signed-off-by: Micha Livne <[email protected]>

* 1. Training script is ready to be tested.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Fixed bugs.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed bugs.

* 1. Fixed missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed missing import.

* 1. Fixed support in seq2seq-br.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed support in seq2seq-br.

* 1. Added NLPDDPPlugin.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added NLPDDPPlugin.

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update setup.py (#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Updated to support multi-node training.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added comments.

Signed-off-by: Micha Livne <[email protected]>

* 1. MTBottleneckModel is in its own file mt_enc_dec_bottleneck_model.

Signed-off-by: Micha Livne <[email protected]>

* 1. Switched loss annealing to rely on self.trainer.global_step

Signed-off-by: Micha Livne <[email protected]>

* 1. Added comments regrding the use of return_ortho_loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added detailed logging of loss during training (still need to do the same for eval).

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing a fix to import bug.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging wrong import issue.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added logging of results to validation step (no tested yet).

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing failing immports.

Signed-off-by: Micha Livne <[email protected]>

* 1. Disabling changes.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Enabled bottleneck architecture.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed identation.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed import statement.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed typo.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed logging of arbitrary values.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed torch lightining logging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added a missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added NLPDDPPlugin.

Signed-off-by: Micha Livne <[email protected]>

* 1. Cleaned style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated sign of computed loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed double import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Moved logging of additional loss terms into MTBottleneckModel class.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated permissions.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added initial perceiver package.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on encoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. FInished implementing Perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated default arch.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Ignoring independant perceiver implementation.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added latent transformer to perceiver

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added TransformerBottleneckDecoderNM.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added TransformerBottleneckEncoderNM.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated bottleneck perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated MTBottleneckModel.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added BridgeEncoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Cleaned code.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated architecture name.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added support in bridge encoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added support in hidden_init_method to BridgeEncoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Removed unneeded imports.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated comment in YAML

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated YAML comments.
2. hidden_blocks in bridge relates to post-processing after bridge1. Updated YAML comments.
2. hidden_blocks in bridge relates to post-processing after bridge (instead of hidden_blocks-1).

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Initial cross attention in Perceiver with params init has independant parameters.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated Perciver forward.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated TransformerEncoder to be a component as opposed to a parent class.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated example command.

Signed-off-by: Micha Livne <[email protected]>

* 1. forward nethod in MTBottleneckModel does not compute loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added label smoothing for per-sample loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated recon_only loss to nll.

Signed-off-by: Micha Livne <[email protected]>

* 1. Update yaml doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated default config to have 32 hidden steps.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed type.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed unreachable code bug.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed wrong sign for reconstruction per sample (instead of per token).

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated comments.

Signed-off-by: Micha Livne <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
blisc added a commit to blisc/NeMo that referenced this pull request Aug 12, 2021
* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Working on bottleneck transformers.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on bottleneck transformers.

* 1. Done cleaning code of bottleneck transformers.
2. Ready to test.

Signed-off-by: Micha Livne <[email protected]>

* 1. Done cleaning code of bottleneck transformers.
2. Ready to test.

* 1. Working on training script.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on training script.

* 1. Updated config class name.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated config class name.

* 1. Training script is ready to be tested.

Signed-off-by: Micha Livne <[email protected]>

* 1. Training script is ready to be tested.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Fixed bugs.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed bugs.

* 1. Fixed missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed missing import.

* 1. Fixed support in seq2seq-br.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed support in seq2seq-br.

* 1. Added NLPDDPPlugin.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added NLPDDPPlugin.

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Updated to support multi-node training.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added comments.

Signed-off-by: Micha Livne <[email protected]>

* 1. MTBottleneckModel is in its own file mt_enc_dec_bottleneck_model.

Signed-off-by: Micha Livne <[email protected]>

* 1. Switched loss annealing to rely on self.trainer.global_step

Signed-off-by: Micha Livne <[email protected]>

* 1. Added comments regrding the use of return_ortho_loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added detailed logging of loss during training (still need to do the same for eval).

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing a fix to import bug.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging wrong import issue.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added logging of results to validation step (no tested yet).

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing failing immports.

Signed-off-by: Micha Livne <[email protected]>

* 1. Disabling changes.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Enabled bottleneck architecture.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed identation.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed import statement.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed typo.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed logging of arbitrary values.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed torch lightining logging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added a missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added NLPDDPPlugin.

Signed-off-by: Micha Livne <[email protected]>

* 1. Cleaned style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated sign of computed loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed double import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Moved logging of additional loss terms into MTBottleneckModel class.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated permissions.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added initial perceiver package.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on encoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. FInished implementing Perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated default arch.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Ignoring independant perceiver implementation.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added latent transformer to perceiver

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added TransformerBottleneckDecoderNM.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added TransformerBottleneckEncoderNM.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated bottleneck perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated MTBottleneckModel.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added BridgeEncoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Cleaned code.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated architecture name.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added support in bridge encoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added support in hidden_init_method to BridgeEncoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Removed unneeded imports.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated comment in YAML

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated YAML comments.
2. hidden_blocks in bridge relates to post-processing after bridge1. Updated YAML comments.
2. hidden_blocks in bridge relates to post-processing after bridge (instead of hidden_blocks-1).

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Initial cross attention in Perceiver with params init has independant parameters.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated Perciver forward.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated TransformerEncoder to be a component as opposed to a parent class.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated example command.

Signed-off-by: Micha Livne <[email protected]>

* 1. forward nethod in MTBottleneckModel does not compute loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added label smoothing for per-sample loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated recon_only loss to nll.

Signed-off-by: Micha Livne <[email protected]>

* 1. Update yaml doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated default config to have 32 hidden steps.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed type.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed unreachable code bug.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed wrong sign for reconstruction per sample (instead of per token).

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated comments.

Signed-off-by: Micha Livne <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Signed-off-by: Jason <[email protected]>
paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request Sep 17, 2021
* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* byt5 unicode implementation (NVIDIA#2365)

* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* update for SH zero -> oh

Signed-off-by: ekmb <[email protected]>

* change n_tagger default

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add check for numba regardless of device

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <[email protected]>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* address comment

Signed-off-by: mchrzanowski <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* update styling

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* avoid circular import

Signed-off-by: Mike Chrzanowski <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <[email protected]>

* typo

Signed-off-by: mchrzanowski <[email protected]>

* missed one

Signed-off-by: mchrzanowski <[email protected]>

* bug fixes

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <[email protected]>

* style fix

Signed-off-by: mchrzanowski <[email protected]>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>
Signed-off-by: mchrzanowski <[email protected]>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <[email protected]>

* undo changes to enja processor

Signed-off-by: mchrzanowski <[email protected]>

* processor selection decision fix

Signed-off-by: mchrzanowski <[email protected]>

* newline fix

Signed-off-by: mchrzanowski <[email protected]>

Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fixes

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <[email protected]>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <[email protected]>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <[email protected]>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <[email protected]>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <[email protected]>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <[email protected]>

* Config change

Signed-off-by: Tuan Lai <[email protected]>

* add blossom-ci.yml (NVIDIA#2401)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Merge r1.1 bugfixes into main (NVIDIA#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <[email protected]>

* update jenkinsfile

Signed-off-by: ericharper <[email protected]>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* fix property when not using model parallel

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* add debug statement

Signed-off-by: ericharper <[email protected]>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <[email protected]>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <[email protected]>

* Update container

Signed-off-by: smajumdar <[email protected]>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <[email protected]>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <[email protected]>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <[email protected]>

* remove outdated instruction

Signed-off-by: fayejf <[email protected]>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <[email protected]>

* move to utils

Signed-off-by: nithinraok <[email protected]>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Revert freq mask guard

Signed-off-by: smajumdar <[email protected]>

* Remove static time width clamping

Signed-off-by: smajumdar <[email protected]>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <[email protected]>

* Typo

Signed-off-by: smajumdar <[email protected]>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <[email protected]>

Co-authored-by: Hoo Chang Shin <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* update notebook branch to main

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Tuan Lai <[email protected]>

* Remove unused imports

Signed-off-by: Tuan Lai <[email protected]>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <[email protected]>

* Fixed imports warnings

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Renamed

Signed-off-by: Tuan Lai <[email protected]>

* Allowed duplex modes

Signed-off-by: Tuan Lai <[email protected]>

* Minor Fix

Signed-off-by: Tuan Lai <[email protected]>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <[email protected]>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <[email protected]>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <[email protected]>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <[email protected]>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <[email protected]>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <[email protected]>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <[email protected]>

* Add logging errors
Signed-off-by: Tuan Lai <[email protected]>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <[email protected]>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <[email protected]>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <[email protected]>

* Added experimental decorators
Signed-off-by: Tuan Lai <[email protected]>

* Updated docs
Signed-off-by: Tuan Lai <[email protected]>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <[email protected]>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <[email protected]>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <[email protected]>

* Refactoring
Signed-off-by: Tuan Lai <[email protected]>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <[email protected]>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <[email protected]>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <[email protected]>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <[email protected]>

Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Mike Chrzanowski <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: mchrzanowski <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Hoo Chang Shin <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request Sep 17, 2021
* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Working on bottleneck transformers.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on bottleneck transformers.

* 1. Done cleaning code of bottleneck transformers.
2. Ready to test.

Signed-off-by: Micha Livne <[email protected]>

* 1. Done cleaning code of bottleneck transformers.
2. Ready to test.

* 1. Working on training script.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on training script.

* 1. Updated config class name.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated config class name.

* 1. Training script is ready to be tested.

Signed-off-by: Micha Livne <[email protected]>

* 1. Training script is ready to be tested.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Fixed bugs.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed bugs.

* 1. Fixed missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed missing import.

* 1. Fixed support in seq2seq-br.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed support in seq2seq-br.

* 1. Added NLPDDPPlugin.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added NLPDDPPlugin.

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Updated to support multi-node training.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added comments.

Signed-off-by: Micha Livne <[email protected]>

* 1. MTBottleneckModel is in its own file mt_enc_dec_bottleneck_model.

Signed-off-by: Micha Livne <[email protected]>

* 1. Switched loss annealing to rely on self.trainer.global_step

Signed-off-by: Micha Livne <[email protected]>

* 1. Added comments regrding the use of return_ortho_loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added detailed logging of loss during training (still need to do the same for eval).

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing a fix to import bug.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging wrong import issue.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added logging of results to validation step (no tested yet).

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing failing immports.

Signed-off-by: Micha Livne <[email protected]>

* 1. Disabling changes.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Enabled bottleneck architecture.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed identation.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed import statement.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed typo.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed logging of arbitrary values.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed torch lightining logging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added a missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added NLPDDPPlugin.

Signed-off-by: Micha Livne <[email protected]>

* 1. Cleaned style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated sign of computed loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed double import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Moved logging of additional loss terms into MTBottleneckModel class.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated permissions.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added initial perceiver package.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on encoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. FInished implementing Perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated default arch.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Ignoring independant perceiver implementation.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added latent transformer to perceiver

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added TransformerBottleneckDecoderNM.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added TransformerBottleneckEncoderNM.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated bottleneck perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated MTBottleneckModel.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added BridgeEncoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Cleaned code.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated architecture name.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added support in bridge encoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added support in hidden_init_method to BridgeEncoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Removed unneeded imports.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated comment in YAML

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated YAML comments.
2. hidden_blocks in bridge relates to post-processing after bridge1. Updated YAML comments.
2. hidden_blocks in bridge relates to post-processing after bridge (instead of hidden_blocks-1).

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Initial cross attention in Perceiver with params init has independant parameters.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated Perciver forward.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated TransformerEncoder to be a component as opposed to a parent class.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated example command.

Signed-off-by: Micha Livne <[email protected]>

* 1. forward nethod in MTBottleneckModel does not compute loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added label smoothing for per-sample loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated recon_only loss to nll.

Signed-off-by: Micha Livne <[email protected]>

* 1. Update yaml doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated default config to have 32 hidden steps.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed type.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed unreachable code bug.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed wrong sign for reconstruction per sample (instead of per token).

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated comments.

Signed-off-by: Micha Livne <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
jfsantos pushed a commit to jfsantos/NeMo that referenced this pull request Nov 19, 2021
* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Correct Dockerfile

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <[email protected]>

* map_location instead of set_device

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* shallow fusion init commit

Signed-off-by: AlexGrinch <[email protected]>

* debug info removed

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <[email protected]>

* upper bound hydra

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update version number

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update package version

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix new test

Signed-off-by: ekmb <[email protected]>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <[email protected]>

* manifest test added

Signed-off-by: ekmb <[email protected]>

* expose more params, new test cases

Signed-off-by: ekmb <[email protected]>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins dollar sign format

Signed-off-by: ekmb <[email protected]>

* addressed review comments

Signed-off-by: ekmb <[email protected]>

* fix decimal in measure

Signed-off-by: ekmb <[email protected]>

* move serial in cardinal

Signed-off-by: ekmb <[email protected]>

* sh tests init

Signed-off-by: ekmb <[email protected]>

* sparrowhawk container tests support added

Signed-off-by: ekmb <[email protected]>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <[email protected]>

* remove duplication

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <[email protected]>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <[email protected]>

* Correct docstring

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Revert unnecessary change

Signed-off-by: smajumdar <[email protected]>

* Guard scheduler for None

Signed-off-by: smajumdar <[email protected]>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <[email protected]>

* Correctly log class that was restored

Signed-off-by: smajumdar <[email protected]>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <[email protected]>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <[email protected]>

* Add conda update for numba

Signed-off-by: smajumdar <[email protected]>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <[email protected]>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Double test of cuda numba

Signed-off-by: smajumdar <[email protected]>

* Enable RNNT tests

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <[email protected]>

* update whitelist, change roman weights

Signed-off-by: ekmb <[email protected]>

* docstrings, space fix, init file

Signed-off-by: ekmb <[email protected]>

* lgtm

Signed-off-by: ekmb <[email protected]>

* fraction with measure class

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <[email protected]>

* Add to documentation

Signed-off-by: smajumdar <[email protected]>

* Improve documentation

Signed-off-by: smajumdar <[email protected]>

* Correct name of the dataset

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <[email protected]>

* fix syntax

Signed-off-by: Yang Zhang <[email protected]>

* check if data dir exists

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* adding pretrained model

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <[email protected]>

* Cleanup Docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <[email protected]>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <[email protected]>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <[email protected]>

* add megatron encoder module

Signed-off-by: ericharper <[email protected]>

* fixed horrible typo

Signed-off-by: ericharper <[email protected]>

* fix typo and add default

Signed-off-by: ericharper <[email protected]>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <[email protected]>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <[email protected]>

* add checkpoint_file property

Signed-off-by: ericharper <[email protected]>

* fix property

Signed-off-by: ericharper <[email protected]>

* num_tokentypes=0

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* find_unused_parameters=True

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* get instead of pop

Signed-off-by: ericharper <[email protected]>

* remove token type ids from megatron input example

Signed-off-by: ericharper <[email protected]>

* pop vocab_size

Signed-off-by: ericharper <[email protected]>

* fix checkpointing for model parallel

Signed-off-by: ericharper <[email protected]>

* fix bug in non model parallel

Signed-off-by: ericharper <[email protected]>

* convert cfg.trainer to dict

Signed-off-by: ericharper <[email protected]>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <[email protected]>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <[email protected]>

* make vocab_file configurable

Signed-off-by: ericharper <[email protected]>

* dataclass can't have mutable default

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* unused imports

Signed-off-by: ericharper <[email protected]>

* revert input example

Signed-off-by: ericharper <[email protected]>

* check that checkpoint version is not None

Signed-off-by: ericharper <[email protected]>

* add mp jenkins test

Signed-off-by: ericharper <[email protected]>

* update docstring

Signed-off-by: ericharper <[email protected]>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <[email protected]>

* Addressed review comments

Signed-off-by: jbalam <[email protected]>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <[email protected]>

* Made changes suggested in review

Signed-off-by: jbalam <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Working on bottleneck transformers.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on bottleneck transformers.

* 1. Done cleaning code of bottleneck transformers.
2. Ready to test.

Signed-off-by: Micha Livne <[email protected]>

* 1. Done cleaning code of bottleneck transformers.
2. Ready to test.

* 1. Working on training script.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on training script.

* 1. Updated config class name.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated config class name.

* 1. Training script is ready to be tested.

Signed-off-by: Micha Livne <[email protected]>

* 1. Training script is ready to be tested.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <[email protected]>

* Correct return reg value

Signed-off-by: smajumdar <[email protected]>

* Initial cpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Try gpu impl

Signed-off-by: smajumdar <[email protected]>

* Correct few impl

Signed-off-by: smajumdar <[email protected]>

* Update fastemit scaling

Signed-off-by: smajumdar <[email protected]>

* Cleanup fastemit

Signed-off-by: smajumdar <[email protected]>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <[email protected]>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Samuel Kriman <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Fixed bugs.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed bugs.

* 1. Fixed missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed missing import.

* 1. Fixed support in seq2seq-br.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed support in seq2seq-br.

* 1. Added NLPDDPPlugin.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added NLPDDPPlugin.

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* 1. Updated to support multi-node training.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added comments.

Signed-off-by: Micha Livne <[email protected]>

* 1. MTBottleneckModel is in its own file mt_enc_dec_bottleneck_model.

Signed-off-by: Micha Livne <[email protected]>

* 1. Switched loss annealing to rely on self.trainer.global_step

Signed-off-by: Micha Livne <[email protected]>

* 1. Added comments regrding the use of return_ortho_loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added detailed logging of loss during training (still need to do the same for eval).

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing a fix to import bug.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging wrong import issue.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added logging of results to validation step (no tested yet).

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing failing immports.

Signed-off-by: Micha Livne <[email protected]>

* 1. Disabling changes.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Enabled bottleneck architecture.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed identation.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed import statement.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed typo.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed logging of arbitrary values.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed torch lightining logging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added a missing import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added NLPDDPPlugin.

Signed-off-by: Micha Livne <[email protected]>

* 1. Cleaned style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated sign of computed loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed double import.

Signed-off-by: Micha Livne <[email protected]>

* 1. Moved logging of additional loss terms into MTBottleneckModel class.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated permissions.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added initial perceiver package.

Signed-off-by: Micha Livne <[email protected]>

* 1. Working on encoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. FInished implementing Perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated default arch.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Ignoring independant perceiver implementation.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added latent transformer to perceiver

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added TransformerBottleneckDecoderNM.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added TransformerBottleneckEncoderNM.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated bottleneck perceiver.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated MTBottleneckModel.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added BridgeEncoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Cleaned code.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated architecture name.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added support in bridge encoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added support in hidden_init_method to BridgeEncoder.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Removed unneeded imports.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated comment in YAML

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated YAML comments.
2. hidden_blocks in bridge relates to post-processing after bridge1. Updated YAML comments.
2. hidden_blocks in bridge relates to post-processing after bridge (instead of hidden_blocks-1).

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Initial cross attention in Perceiver with params init has independant parameters.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated Perciver forward.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated TransformerEncoder to be a component as opposed to a parent class.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated example command.

Signed-off-by: Micha Livne <[email protected]>

* 1. forward nethod in MTBottleneckModel does not compute loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added label smoothing for per-sample loss.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated recon_only loss to nll.

Signed-off-by: Micha Livne <[email protected]>

* 1. Update yaml doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated default config to have 32 hidden steps.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed type.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed unreachable code bug.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed wrong sign for reconstruction per sample (instead of per token).

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated comments.

Signed-off-by: Micha Livne <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jagadeesh Balam <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect Citrinet transcription
2 participants