Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COG Datalyer #2

Open
wants to merge 70 commits into
base: master
Choose a base branch
from
Open

COG Datalyer #2

wants to merge 70 commits into from

Conversation

aasseman
Copy link

@aasseman aasseman commented Jun 2, 2020

Ported COG dataset from https://github.com/IBM/mi-prometheus as a NeMo datalayer.
Also made lots of cleanup of the original code. Tried to organize the commits in a clean and logical way, so consulting the commits one by one should help with tracking the modifications.

To run test, in the root of the NeMo python module dir:

python -m nemo.collections.visual_reasoning.modules.data_layers.cog.datalayer

If X is available, the test will show some samples using matplotlib.

gkucsko and others added 30 commits June 2, 2020 19:46
…VIDIA#693)

* update sgd numbers after fix in seen services and slot request loss

Signed-off-by: Yang Zhang <[email protected]>

* fix table

Signed-off-by: Yang Zhang <[email protected]>

* add more information to documentation

Signed-off-by: Yang Zhang <[email protected]>

* fix doc

Signed-off-by: Yang Zhang <[email protected]>

* fix doc

Signed-off-by: Yang Zhang <[email protected]>

* fix doc

Signed-off-by: Yang Zhang <[email protected]>
* Added user sys tag to TRADE.

Signed-off-by: Vahid Noroozi <[email protected]>
…#695)

* megatron glue numbers added, default amp level reverted to O0

Signed-off-by: Evelina Bakhturina <[email protected]>

* table reformatted

Signed-off-by: Evelina Bakhturina <[email protected]>
editdistance package for fast WER calculation
Signed-off-by: Oleksii Kuchaiev <[email protected]>
…IA#673)

* add VAD

Signed-off-by: fayejf <[email protected]>

* update with PR comments

Signed-off-by: fayejf <[email protected]>

* revert change on asr notebook 4&5

Signed-off-by: fayejf <[email protected]>

* update with PR comments

Signed-off-by: fayejf <[email protected]>

* fix typos

Signed-off-by: fayejf <[email protected]>

* upload docs and resolve parts of PR comments

Signed-off-by: fayejf <[email protected]>

* fix doc bib issue

Signed-off-by: fayejf <[email protected]>

* fix jenksin doc issue

Signed-off-by: fayejf <[email protected]>

* fix some warning/typo

Signed-off-by: fayejf <[email protected]>

* update notebook#6, improve data process scripts, and some fix

Signed-off-by: fayejf <[email protected]>

* some minor changes

Signed-off-by: fayejf <[email protected]>

* fix bib issue

Signed-off-by: fayejf <[email protected]>

* little fix to avoid misunderstanding

Signed-off-by: fayejf <[email protected]>
* update an4 notebook

Signed-off-by: Jason <[email protected]>

* colab bugfix

Signed-off-by: Jason <[email protected]>

* update script

Signed-off-by: Jason <[email protected]>

* fix notebooks

Signed-off-by: Jason <[email protected]>

* fix notebooks

Signed-off-by: Jason <[email protected]>
* Durations extraction with script draft.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Durations extraction notebooks.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Finished bulk part of durations predictor.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add tensorboard logging.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add general-style train logger.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Change LibriSpeech parts order and move train logger callback to core.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add one big file durs saving.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Big batch params change.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add full pad option to data loader as default.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Complete durs pipeline with evaluation.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Rename durs ngc script.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Adjust duration main script default for ngc run.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix problem with torch.bool dist eval.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add LibriTTS processing.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add FasterSpeech full pipeline reaching about 0.4 MSE for LibriTTS.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add QN retrain NGC pipeline, new dur XE steps loss and mel Griffin-Lim sampling.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add train logging for mel with audio sampling and super sampler.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add length sampler.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Set SSS as default and introduce local shuffling.

Signed-off-by: Stanislav Beliaev <[email protected]>

* New defaults.

Signed-off-by: Stanislav Beliaev <[email protected]>

* W&B Support, new speaker system and some refactoring

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add simple durs aug.

Signed-off-by: Stanislav Beliaev <[email protected]>

* New baseline (1)

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix trim bug and make default O2.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add pad16.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix dist eval error and add variable steps.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add WaveGlow inference and fix pad16 bug.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Generalize mel loss.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Generalize pad op.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Move pad16 logic to loss.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add fmin/fmax to griffin-lim vocoding.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Move model params to config.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add denoiser argument to WaveGlow inference.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add new durs with all 1s by default.

Signed-off-by: Stanislav Beliaev <[email protected]>

* New baseline (3)

Signed-off-by: Stanislav Beliaev <[email protected]>

* New baseline (4)

Signed-off-by: Stanislav Beliaev <[email protected]>

* New baseline (5)

Signed-off-by: Stanislav Beliaev <[email protected]>

* Refactor durs predictor script.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Durs predictor baseline

Signed-off-by: Stanislav Beliaev <[email protected]>

* Adjusted durs scirpt for NGC.

Signed-off-by: Stanislav Beliaev <[email protected]>

* New durs lj baseline

Signed-off-by: Stanislav Beliaev <[email protected]>

* Update durs baseline params.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Add durs/blanks acc metrics.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix durs baseline.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Current state

Signed-off-by: Stanislav Beliaev <[email protected]>

* Update NGC scripts and implement shake_all aug.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Update augmentations implementations.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Bunch of things

Signed-off-by: Stanislav Beliaev <[email protected]>

* Change name to TalkNet.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Latest notebooks changes

Signed-off-by: Stanislav Beliaev <[email protected]>

* Working scripts with latest master changes

Signed-off-by: Stanislav Beliaev <[email protected]>

* Finished trimming durs predictor code.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Trimmed mels part.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Delete dev folder.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix style errors.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix LGTM errors.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Revert simple logging changes.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix problems.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix problems.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Remove WG inference and add type hints for data layer.

Signed-off-by: Stanislav Beliaev <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>
…s) (NVIDIA#675)

* git history clean up

Signed-off-by: Evelina Bakhturina <[email protected]>

* nlp references to the tutotials

Signed-off-by: Evelina Bakhturina <[email protected]>

* sphinx fix

Signed-off-by: Evelina Bakhturina <[email protected]>

* review feedback

Signed-off-by: Evelina Bakhturina <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>
* initial commit of callback documentation

Signed-off-by: Jason <[email protected]>

* some syntax fixes

Signed-off-by: Jason <[email protected]>

* add old callbacks file

Signed-off-by: Jason <[email protected]>

* finalize docs; change train to action

Signed-off-by: Jason <[email protected]>

* style

Signed-off-by: Jason <[email protected]>

* update sphinx style

Signed-off-by: Jason <[email protected]>

* update sphinx warnings

Signed-off-by: Jason <[email protected]>

* train->action rename bug

Signed-off-by: Jason <[email protected]>

* address comments

Signed-off-by: Jason <[email protected]>

* comments

Signed-off-by: Jason <[email protected]>
Update README (pretrained ASR model information)
Bugfix to output ports of Kaldi data layer
Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>
* pm+nlg for multiwoz init

Signed-off-by: Evelina Bakhturina <[email protected]>

* pipeline is working, init clean up

Signed-off-by: Evelina Bakhturina <[email protected]>

* headers added

Signed-off-by: Evelina Bakhturina <[email protected]>

* fixed invalid .json file, added db files to multiwoz preprocessing

Signed-off-by: Evelina Bakhturina <[email protected]>

* code clean up

Signed-off-by: Evelina Bakhturina <[email protected]>

* lgtm fixes

Signed-off-by: Evelina Bakhturina <[email protected]>

* docs for TRADE update, jenkins for ruled_based example

Signed-off-by: Evelina Bakhturina <[email protected]>

* jenkins fix

Signed-off-by: Evelina Bakhturina <[email protected]>

* ports refactor wip

Signed-off-by: Evelina Bakhturina <[email protected]>

* ports refactor wip

Signed-off-by: Evelina Bakhturina <[email protected]>

* wip works

Signed-off-by: Evelina Bakhturina <[email protected]>

* neural types refactored

Signed-off-by: Evelina Bakhturina <[email protected]>

* remove unused

Signed-off-by: Evelina Bakhturina <[email protected]>

* lgtm fixes

Signed-off-by: Evelina Bakhturina <[email protected]>

* typo

Signed-off-by: Evelina Bakhturina <[email protected]>

* state dict splited

Signed-off-by: Evelina Bakhturina <[email protected]>

* lgtm fixes

Signed-off-by: Evelina Bakhturina <[email protected]>

* fixing the process script, moved multiwoz_mapping.pair to multiwoz, enabled utilization of relative paths

Signed-off-by: Tomasz Kornuta <[email protected]>

* formatting fix

Signed-off-by: Tomasz Kornuta <[email protected]>

* reformatted the code, ready for definition of NG by connecting the modules - and fixing the definitions

Signed-off-by: Tomasz Kornuta <[email protected]>

* work in progress-ess, not working, internet issues

Signed-off-by: Tomasz Kornuta <[email protected]>

* UtteranceEncoder neural types wip

Signed-off-by: nvidia <[email protected]>

* utterance encoder neural types

Signed-off-by: nvidia <[email protected]>

* updating trade outputs

Signed-off-by: nvidia <[email protected]>

* updating trade outputs

Signed-off-by: nvidia <[email protected]>

* fightihg with belief state

Signed-off-by: nvidia <[email protected]>

* Cannot make second named tuple work

Signed-off-by: nvidia <[email protected]>

* reorganized files, whole pipeline handshaking works

Signed-off-by: nvidia <[email protected]>

* reorganized files, whole pipeline handshaking works

Signed-off-by: nvidia <[email protected]>

* polish

Signed-off-by: nvidia <[email protected]>

* Fix of my dummy error

Signed-off-by: nvidia <[email protected]>

* new examples

Signed-off-by: nvidia <[email protected]>

* style fix

Signed-off-by: Evelina Bakhturina <[email protected]>

* fixed TRADE training

Signed-off-by: Evelina Bakhturina <[email protected]>

* Added module responsible for sys uttr dialog history update

Signed-off-by: nvidia <[email protected]>

* LGTM fix

Signed-off-by: nvidia <[email protected]>

* moved dialog specific axesc andctypes to nlp/neural_types.py, refactored the modules

Signed-off-by: nvidia <[email protected]>

* style fix

Signed-off-by: nvidia <[email protected]>

Co-authored-by: Tomasz Kornuta <[email protected]>
* make test better

Signed-off-by: Jason <[email protected]>

* fix rename error during topological sort

Signed-off-by: Jason <[email protected]>

* test fix

Signed-off-by: Jason <[email protected]>
@aasseman
Copy link
Author

FYI, I just rebased to a more recent upstream master.

okuchaiev and others added 23 commits June 11, 2020 13:03
Fixed 2_Online_ASR_Microphone_Demo notebook to support new config
Signed-off-by: Oleksii Kuchaiev <[email protected]>
…VIDIA#724)

* Added ability to write audio to tensorboard during Tacotron training

Signed-off-by: Polezhaev Sergej <[email protected]>

* Removed unused import

Signed-off-by: Polezhaev Sergej <[email protected]>

Co-authored-by: Sergey Polezhaev <[email protected]>
Signed-off-by: Alexis Asseman <[email protected]>
Signed-off-by: Alexis Asseman <[email protected]>
Signed-off-by: Alexis Asseman <[email protected]>
Signed-off-by: Alexis Asseman <[email protected]>
Signed-off-by: Alexis Asseman <[email protected]>
Signed-off-by: Alexis Asseman <[email protected]>
Signed-off-by: Alexis Asseman <[email protected]>
…rent subdirs, as they are not datalayers.

Signed-off-by: Alexis Asseman <[email protected]>
Signed-off-by: Alexis Asseman <[email protected]>
Signed-off-by: Alexis Asseman <[email protected]>
Signed-off-by: Alexis Asseman <[email protected]>
Alexis Asseman added 3 commits June 16, 2020 11:26
@aasseman aasseman marked this pull request as ready for review June 19, 2020 17:08
tkornuta-nvidia pushed a commit that referenced this pull request Aug 25, 2020
* Integrated Megatron-LM

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressing PR comments, trying NER example

Signed-off-by: Boris Fomitchev <[email protected]>

* manual style fix

Signed-off-by: Boris Fomitchev <[email protected]>

* manual style fix #2

Signed-off-by: Boris Fomitchev <[email protected]>

* Resolving circular import

Signed-off-by: Boris Fomitchev <[email protected]>

* Static analysys warnings addressed

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed code review; Jenkins test added

Signed-off-by: Boris Fomitchev <[email protected]>

* Removing parallel feom Megatron

Signed-off-by: Boris Fomitchev <[email protected]>

* Added more info to tokenizer printout, made megatron bert derivative explicit

Signed-off-by: Boris Fomitchev <[email protected]>

* Bumping Megatron-LM version to get APEX fix

Signed-off-by: Boris Fomitchev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.