Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-supervised tutorial & update #3344

Merged
merged 50 commits into from
Jan 26, 2022
Merged

Self-supervised tutorial & update #3344

merged 50 commits into from
Jan 26, 2022

Conversation

sam1373
Copy link
Contributor

@sam1373 sam1373 commented Dec 15, 2021

  • Basic tutorial for self-supervised pre-training and subsequent supervised fine-tuning
  • Config for conformer SSL (may be changed later)
  • Update to convert_to_tarred script to enable not having "text" field in manifest, and allow using "offset" for manifest without labels
  • Add optional non-stride layers to reconstruction decoder
  • Add option to load part of model from second nemo checkpoint from config (for example if we want to load encoder from one pre-trained model, and decoder from another)

Changelog

  • New way to initialize parts of models into current model for ssl
  init_from_nemo_model: Str path to a .nemo model in order to load state_dict from single nemo file;
  if loading from multiple files, pass in a dict where the values have the following fields:
      - path: Str path to .nemo model
      - include: Optional list of strings, at least one of which needs to be contained in parameter name
      to be loaded from this .nemo file. Default: everything is included.
      - exclude: Optional list of strings, which can be used to exclude any parameter containing one of
      these strings from being loaded from this .nemo file. Default: nothing is excluded.
      hydra usage example:
...

init_from_nemo_model:
    model0:
        path:<path/to/model1>
        include:["encoder"]
    model1:
        path:<path/to/model2>
        include:["decoder"]
        exclude:["embed"]

Signed-off-by: sam1373 <[email protected]>
Signed-off-by: sam1373 <[email protected]>
Signed-off-by: sam1373 <[email protected]>
Signed-off-by: sam1373 <[email protected]>
Signed-off-by: sam1373 <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Dec 15, 2021

This pull request introduces 1 alert and fixes 1 when merging 7bc8f88 into 8ce6e7a - view on LGTM.com

new alerts:

  • 1 for Unnecessary delete statement in function

fixed alerts:

  • 1 for Unnecessary delete statement in function

@sam1373 sam1373 marked this pull request as ready for review December 15, 2021 21:59
@sam1373 sam1373 requested a review from titu1994 December 15, 2021 22:01
@lgtm-com
Copy link

lgtm-com bot commented Dec 15, 2021

This pull request introduces 1 alert and fixes 1 when merging 690b8ec into 8ce6e7a - view on LGTM.com

new alerts:

  • 1 for Unnecessary delete statement in function

fixed alerts:

  • 1 for Unnecessary delete statement in function

@lgtm-com
Copy link

lgtm-com bot commented Dec 16, 2021

This pull request fixes 1 alert when merging 1b81e44 into 89910ae - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

@okuchaiev
Copy link
Member

/blossom-ci

Signed-off-by: sam1373 <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Dec 20, 2021

This pull request fixes 1 alert when merging b8e5bf0 into a3312f3 - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

@sam1373 sam1373 removed the request for review from titu1994 December 20, 2021 15:59
@lgtm-com
Copy link

lgtm-com bot commented Dec 20, 2021

This pull request fixes 1 alert when merging d5ded21 into a3312f3 - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

@lgtm-com
Copy link

lgtm-com bot commented Dec 20, 2021

This pull request fixes 1 alert when merging b42c747 into a3312f3 - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

@lgtm-com
Copy link

lgtm-com bot commented Dec 20, 2021

This pull request fixes 1 alert when merging 62f4bc9 into eb33ddd - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

@lgtm-com
Copy link

lgtm-com bot commented Dec 21, 2021

This pull request fixes 1 alert when merging c899811 into f7e4ed7 - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

Signed-off-by: sam1373 <[email protected]>
Signed-off-by: sam1373 <[email protected]>
Signed-off-by: sam1373 <[email protected]>
@sam1373 sam1373 requested a review from titu1994 January 19, 2022 22:45
@titu1994 titu1994 mentioned this pull request Jan 20, 2022
@lgtm-com
Copy link

lgtm-com bot commented Jan 24, 2022

This pull request fixes 1 alert when merging 6d38032 into 7c97e33 - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

Signed-off-by: sam1373 <[email protected]>
examples/asr/conf/citrinet/citrinet_1024.yaml Show resolved Hide resolved
examples/asr/conf/ssl/conformer/conformer_ssl.yaml Outdated Show resolved Hide resolved
nemo/core/classes/modelPT.py Outdated Show resolved Hide resolved

init_from_pretrained_model: Str name of a pretrained model checkpoint (obtained via cloud).
The model will be downloaded (or a cached copy will be used), instantiated and then
its state dict will be extracted.
its state dict will be extracted. If loading from multiple files, you can pass in a dict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can user pass such a dictionary with hydra?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added example

tutorials/asr/README.md Outdated Show resolved Hide resolved
@@ -29,6 +29,8 @@ In this repository, you will find several tutorials discussing what is Automatic

10) `ASR_with_Transducers`: In this tutorial, we take a deep dive into Transducer based ASR models, discussing the similarity of setup and config to CTC models and then train a small ContextNet model on the AN4 dataset. We then discuss how to change the decoding strategy of a trained Transducer from greedy search to beam search. Finally, we wrap up this tutorial by extraining the alignment matrix from a trained Transducer model.

11) `Self_Supervised_Pre_Training`: It can often be difficult to obtain labeled data for ASR training. In this tutorial, we demonstrate how to pre-train a small Citrinet model in an unsupervised manner, and then fine-tune with CTC loss.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding this new feature of self-supervised learning to the readme of nemo?

@lgtm-com
Copy link

lgtm-com bot commented Jan 24, 2022

This pull request fixes 1 alert when merging b3564c1 into 7c97e33 - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

Signed-off-by: sam1373 <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Jan 24, 2022

This pull request fixes 1 alert when merging 223e048 into 7c97e33 - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

@lgtm-com
Copy link

lgtm-com bot commented Jan 24, 2022

This pull request fixes 1 alert when merging 3767be4 into 7c97e33 - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

@lgtm-com
Copy link

lgtm-com bot commented Jan 24, 2022

This pull request fixes 1 alert when merging 78869b9 into 7c97e33 - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

Copy link
Collaborator

@VahidooX VahidooX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lgtm-com
Copy link

lgtm-com bot commented Jan 26, 2022

This pull request fixes 1 alert when merging 40823c1 into 360fa7c - view on LGTM.com

fixed alerts:

  • 1 for Unnecessary delete statement in function

@sam1373 sam1373 merged commit 9dc612e into NVIDIA:main Jan 26, 2022
nithinraok pushed a commit that referenced this pull request Feb 2, 2022
* update

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

* version test

Signed-off-by: sam1373 <[email protected]>

* version test

Signed-off-by: sam1373 <[email protected]>

* image for tutorial

Signed-off-by: sam1373 <[email protected]>

* enc_final in model_defaults

Signed-off-by: sam1373 <[email protected]>

* enc_final in model_defaults

Signed-off-by: sam1373 <[email protected]>

* self-supervised tutorial

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* contextnet ssl config

Signed-off-by: sam1373 <[email protected]>

* remove test_ds from config

Signed-off-by: sam1373 <[email protected]>

* update recon decoder

Signed-off-by: sam1373 <[email protected]>

* don't save -last if val_loss is nan

Signed-off-by: sam1373 <[email protected]>

* check if val_loss is there

Signed-off-by: sam1373 <[email protected]>

* keep entries from same file together when tarring

Signed-off-by: sam1373 <[email protected]>

* keep entries from same file together when tarring

Signed-off-by: sam1373 <[email protected]>

* print num of files in shard

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

* style

Signed-off-by: sam1373 <[email protected]>

* moving configs, add docstrings

Signed-off-by: sam1373 <[email protected]>

* tutorial updates

Signed-off-by: sam1373 <[email protected]>

* update test

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* citrinet configs

Signed-off-by: sam1373 <[email protected]>

* citrinet configs update

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

* default include all

Signed-off-by: sam1373 <[email protected]>

* comments

Signed-off-by: sam1373 <[email protected]>

* docstring hydra example

Signed-off-by: sam1373 <[email protected]>
fayejf pushed a commit that referenced this pull request Mar 2, 2022
* update

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

* version test

Signed-off-by: sam1373 <[email protected]>

* version test

Signed-off-by: sam1373 <[email protected]>

* image for tutorial

Signed-off-by: sam1373 <[email protected]>

* enc_final in model_defaults

Signed-off-by: sam1373 <[email protected]>

* enc_final in model_defaults

Signed-off-by: sam1373 <[email protected]>

* self-supervised tutorial

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* contextnet ssl config

Signed-off-by: sam1373 <[email protected]>

* remove test_ds from config

Signed-off-by: sam1373 <[email protected]>

* update recon decoder

Signed-off-by: sam1373 <[email protected]>

* don't save -last if val_loss is nan

Signed-off-by: sam1373 <[email protected]>

* check if val_loss is there

Signed-off-by: sam1373 <[email protected]>

* keep entries from same file together when tarring

Signed-off-by: sam1373 <[email protected]>

* keep entries from same file together when tarring

Signed-off-by: sam1373 <[email protected]>

* print num of files in shard

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

* style

Signed-off-by: sam1373 <[email protected]>

* moving configs, add docstrings

Signed-off-by: sam1373 <[email protected]>

* tutorial updates

Signed-off-by: sam1373 <[email protected]>

* update test

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* citrinet configs

Signed-off-by: sam1373 <[email protected]>

* citrinet configs update

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

* default include all

Signed-off-by: sam1373 <[email protected]>

* comments

Signed-off-by: sam1373 <[email protected]>

* docstring hydra example

Signed-off-by: sam1373 <[email protected]>
aroraakshit pushed a commit to aroraakshit/NeMo that referenced this pull request Apr 23, 2022
* update

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

* version test

Signed-off-by: sam1373 <[email protected]>

* version test

Signed-off-by: sam1373 <[email protected]>

* image for tutorial

Signed-off-by: sam1373 <[email protected]>

* enc_final in model_defaults

Signed-off-by: sam1373 <[email protected]>

* enc_final in model_defaults

Signed-off-by: sam1373 <[email protected]>

* self-supervised tutorial

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* contextnet ssl config

Signed-off-by: sam1373 <[email protected]>

* remove test_ds from config

Signed-off-by: sam1373 <[email protected]>

* update recon decoder

Signed-off-by: sam1373 <[email protected]>

* don't save -last if val_loss is nan

Signed-off-by: sam1373 <[email protected]>

* check if val_loss is there

Signed-off-by: sam1373 <[email protected]>

* keep entries from same file together when tarring

Signed-off-by: sam1373 <[email protected]>

* keep entries from same file together when tarring

Signed-off-by: sam1373 <[email protected]>

* print num of files in shard

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

* style

Signed-off-by: sam1373 <[email protected]>

* moving configs, add docstrings

Signed-off-by: sam1373 <[email protected]>

* tutorial updates

Signed-off-by: sam1373 <[email protected]>

* update test

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* update loading

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* citrinet configs

Signed-off-by: sam1373 <[email protected]>

* citrinet configs update

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

* default include all

Signed-off-by: sam1373 <[email protected]>

* comments

Signed-off-by: sam1373 <[email protected]>

* docstring hydra example

Signed-off-by: sam1373 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants