Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Megatron positional encoding alibi fix (#5808) (#5863) * 1. Debugging. * 1. Debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. * 1. Debugging. * 1. Fixed initialization. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. * 1. Removed scale from ALiBi. Signed-off-by: Micha Livne <[email protected]> * 1. Updated yaml and added support to control number of alibi heads. Signed-off-by: Micha Livne <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Removed num_attention_heads_alibi from configs. Signed-off-by: Micha Livne <[email protected]> Signed-off-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Jason <[email protected]> * Fix segmenting for pcla inference (#5849) * Fix segmenting for pcla inference Signed-off-by: Matvei Novikov <[email protected]> * Fix segmenting for pcla inference Signed-off-by: Matvei Novikov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Matvei Novikov <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * indentation fix (#5861) (#5862) Signed-off-by: nithinraok <[email protected]> Signed-off-by: nithinraok <[email protected]> Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Jason <[email protected]> * add ambernet to readme (#5872) (#5873) Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Jason <[email protected]> * Fix wrong label mapping in batch_inference for label_model (#5767) (#5870) * fix batch inference * add test for batch * fix device Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Jason <[email protected]> * WAR for https://github.com/pytorch/pytorch/pull/91526 Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Fix memory allocation of NeMo Multi-speaker Data Simulator (#5864) * fix data simulator Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * Adding noise_manifest handling for faster speed Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added multi-gpu feature Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added a parameter for noise source file number Signed-off-by: Taejin Park <[email protected]> * Fixed noise_manifest error bug Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * RETRO model finetuning (#5800) * add save and load dynmaic index Signed-off-by: Yi Dong <[email protected]> * add chunk stride feature Signed-off-by: Yi Dong <[email protected]> * add chunk stride feature Signed-off-by: Yi Dong <[email protected]> * add no pq index Signed-off-by: Yi Dong <[email protected]> * added megatron lm compatible mode Signed-off-by: Yi Dong <[email protected]> * addd config Signed-off-by: Yi Dong <[email protected]> * fix position embedding Signed-off-by: Yi Dong <[email protected]> * added index factory Signed-off-by: Yi Dong <[email protected]> * share neighbors and weights amoung strategies Signed-off-by: Yi Dong <[email protected]> * fix bug Signed-off-by: Yi Dong <[email protected]> * added metric tto faiss index Signed-off-by: Yi Dong <[email protected]> * set default to inner product Signed-off-by: Yi Dong <[email protected]> * added qa fine tuen dataset Signed-off-by: Yi Dong <[email protected]> * added fine tuning code Signed-off-by: Yi Dong <[email protected]> * trim it Signed-off-by: Yi Dong <[email protected]> * fix data issue Signed-off-by: Yi Dong <[email protected]> * fix style Signed-off-by: Yi Dong <[email protected]> * added version Signed-off-by: Yi Dong <[email protected]> * fix key error Signed-off-by: Yi Dong <[email protected]> * make sure to overwrite the cfg Signed-off-by: Yi Dong <[email protected]> * make multiple sentence bert available Signed-off-by: Yi Dong <[email protected]> * fix the document Signed-off-by: Yi Dong <[email protected]> * fix the table Signed-off-by: Yi Dong <[email protected]> * fix transformer Signed-off-by: Yi Dong <[email protected]> * make sure to turn off the rope in chunked cross attention layer Signed-off-by: Yi Dong <[email protected]> * fix the security issue Signed-off-by: Yi Dong <[email protected]> * style fix Signed-off-by: Yi Dong <[email protected]> * fix codeql issues Signed-off-by: Yi Dong <[email protected]> * fix Signed-off-by: Yi Dong <[email protected]> * use -1 Signed-off-by: Yi Dong <[email protected]> * fix empty index Signed-off-by: Yi Dong <[email protected]> * clean up Signed-off-by: Yi Dong <[email protected]> * fix the lower bound for repetition penalty Signed-off-by: Yi Dong <[email protected]> * add retro qa inference strategy Signed-off-by: Yi Dong <[email protected]> * added new inference logic Signed-off-by: Yi Dong <[email protected]> * working inference Signed-off-by: Yi Dong <[email protected]> * fix TP inference Signed-off-by: Yi Dong <[email protected]> * revert requirement Signed-off-by: Yi Dong <[email protected]> * added file inference Signed-off-by: Yi Dong <[email protected]> * use string to prevent collison Signed-off-by: Yi Dong <[email protected]> * use NQ test Signed-off-by: Yi Dong <[email protected]> * fix prompt Signed-off-by: Yi Dong <[email protected]> * fix inference Signed-off-by: Yi Dong <[email protected]> * set good defaults for demo Signed-off-by: Yi Dong <[email protected]> * replicate adlr Signed-off-by: Yi Dong <[email protected]> * make sure to turn off attention reset for megatron lm compatible model Signed-off-by: Yi Dong <[email protected]> * style fix Signed-off-by: Yi Dong <[email protected]> * fix typo Signed-off-by: Yi Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix inference error Signed-off-by: Yi Dong <[email protected]> * fix logging Signed-off-by: Yi Dong <[email protected]> * address comments Signed-off-by: Yi Dong <[email protected]> --------- Signed-off-by: Yi Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * [TTS] GAN-based spectrogram enhancer (#5565) * [TTS] add SpectrogramEnhancer based on StyleGAN 2 Signed-off-by: Roman Korostik <[email protected]> * [TTS] some tests for spectrogram enhancer Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: a tiny clean up Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: log images during training Signed-off-by: Roman Korostik <[email protected]> * exp_manager: pass save_on_train_epoch_end to checkpointing callback Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: add training script and config examples Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix comments Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: don't assume FastPitch Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: better input shapes handling Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix porting error Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix logging and .nemo saving Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: clean up scaling Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: formatting Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: update examples Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: shape handling Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove LoggerCollection handling Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: copyright notice for tests Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: use process_batch helper Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: return empty list of available models Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: some docs Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: style --fix Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: chan_last -> channel_last Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove unused imports Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove unused return value Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: losses are nn.Modules now Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: init optimizers from config Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: formatting Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: unused imports Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: typechecking Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: more tests Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix logging images Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: unclutter prepare_batch Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: init generator and discriminator from the config for consistency with other NeMo models Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: update spectrogram range in the example config Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: comment on loss weights in the example config Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: rename Conv2DMod to Conv2DModulated Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove unused imports Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix CodeQL import warnings Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: type_as_recursive -> to_device_recursive Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: move to_device_recursive to helpers Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: move losses to a separate module, add comments Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: add optimizers' entries to config Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix test configs Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: support length masking for 3-dim tensors Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: add masking to spectrogram normalization Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix tests Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: add spectrogram normalization tests Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix imports and formatting in tests Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix docstring typo Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: rename G and D to generator and discriminator Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: better argument naming in interfaces (condition -> input_spectograms, target -> target_spectrograms) Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: formatting Signed-off-by: Roman Korostik <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [TTS] SpectrogramEnhancer: fix import warnings in modules Signed-off-by: Roman Korostik <[email protected]> * [TTS] add resynthesize_dataset.py script Signed-off-by: Roman Korostik <[email protected]> * [TTS] add PairedRealFakeSpectrogramsDataset Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: update example config to reflect new data setup Signed-off-by: Roman Korostik <[email protected]> * [TTS] resynthesize_dataset.py: remove unused imports Signed-off-by: Roman Korostik <[email protected]> * [TTS] resynthesize_dataset.py: use nemo manifest handling Signed-off-by: Roman Korostik <[email protected]> * [TTS] resynthesize_dataset.py: remove unused import Signed-off-by: Roman Korostik <[email protected]> * [TTS] resynthesize_dataset.py: underscores for .npy names Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove return value from a test Signed-off-by: Roman Korostik <[email protected]> * [TTS] add length masking helper Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: use common tts length mask function Signed-off-by: Roman Korostik <[email protected]> * [TTS] unused imports in tts helpers Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix an import Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: introduce computed upsample_factor to generator Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: clean up and clarify validation data setup Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove a hardcoded path in the example config Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: configurize max_spectrogram_length in generator Signed-off-by: Roman Korostik <[email protected]> * [TTS] resynthesize_dataset.py: consistent dashes and underscores in CLI args Signed-off-by: Roman Korostik <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Roman Korostik <[email protected]> Signed-off-by: Roman Korostik <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * Optimizing distributed Adam when running with one work queue (#5560) * Dist Adam constructs a single param bucket for each GPT layer Signed-off-by: Tim Moon <[email protected]> * Synchronize dist Adam reduce-scatters before launching model-parallel all-reduces Signed-off-by: Tim Moon <[email protected]> * Configure per-layer dist Adam buckets for BERT and T5 Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Configure GPT with one dist Adam bucket per virtual pipeline stage Signed-off-by: Tim Moon <[email protected]> * Configure BERT with one dist Adam bucket per virtual pipeline stage Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in Dockerfile Need recent updates to Apex distributed Adam optimizer. Signed-off-by: Tim Moon <[email protected]> * Remove logic for per-virtual-pipeline distopt buckets from T5 Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Jason <[email protected]> * fix(readme): fix typo (#5883) Signed-off-by: Jean-Louis Queguiner <[email protected]> Signed-off-by: Jason <[email protected]> * TTS inference with Heteronym classification model, hc model inference refactoring (#5768) * refactor inference, fix span detection Signed-off-by: ekmb <[email protected]> * fix merge conflicts Signed-off-by: ekmb <[email protected]> * fix merge conflicts Signed-off-by: ekmb <[email protected]> * remove unused var Signed-off-by: ekmb <[email protected]> * clean up, test update Signed-off-by: ekmb <[email protected]> * arg name update Signed-off-by: ekmb <[email protected]> * merge wip Signed-off-by: ekmb <[email protected]> * revert changes Signed-off-by: ekmb <[email protected]> * update docs, move heteronym to baseg2p Signed-off-by: ekmb <[email protected]> * change wordid file defaults to none Signed-off-by: ekmb <[email protected]> * add manifest check Signed-off-by: ekmb <[email protected]> * replace homograph with heteronym, upper case wordid for riva, review feedback Signed-off-by: ekmb <[email protected]> * add log message, update comment Signed-off-by: ekmb <[email protected]> * rename test manifest field Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Jason <[email protected]> * take out retro doc (#5885) (#5886) Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Yi Dong <[email protected]> Signed-off-by: Jason <[email protected]> * Add option to disable distributed parameters in distributed Adam optimizer (#5685) * Add option to run dist Adam without distributed params Similar to DDP, but leverages dist Adam's support for overlapping communication with backward compute Signed-off-by: Tim Moon <[email protected]> * Fix bug in grad clipping when dist Adam has redundant params Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Jason <[email protected]> * [ASR] Separate Audio-to-Text (BPE, Char) dataset construction (#5774) * Separate full BPE dataset construction Signed-off-by: Vladimir Bataev <[email protected]> * Fix the case when the dataset is None Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix typos Signed-off-by: Vladimir Bataev <[email protected]> * Separate char dataset construction. Fix DALI dataset usage. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jason <[email protected]> * transformer duration added and IPA config files added Signed-off-by: Jason <[email protected]> * inference issue for pace resolved Signed-off-by: Jason <[email protected]> * Latest ONNX develpoments Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Remove MCD_DTW tarball (#5889) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jason <[email protected]> * Block large files from being merged into NeMo main (#5898) * Attempt to use large-file pre-commit ci hook Signed-off-by: SeanNaren <[email protected]> * Set defaults and enforce Signed-off-by: SeanNaren <[email protected]> * Set to 1000 Signed-off-by: SeanNaren <[email protected]> * Remove enforcement Signed-off-by: SeanNaren <[email protected]> --------- Signed-off-by: SeanNaren <[email protected]> Signed-off-by: Jason <[email protected]> * Reduce memory usage in getMultiScaleCosAffinityMatrix function (#5876) * Updated offline_clustering.py, the getMultiScaleCosAffinityMatrix function, reduced memory usage Signed-off-by: gabitza-tech <[email protected]> * torch.empty.cache() outside forward_infer() Signed-off-by: Taejin Park <[email protected]> * Removed unnecessary lines Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Speed up for non torch.jit.script Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * parallelism is default off Signed-off-by: Taejin Park <[email protected]> * nme_mat_size is unified as 512, removing redundant docstring Signed-off-by: Taejin Park <[email protected]> --------- Signed-off-by: gabitza-tech <[email protected]> Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * set max_steps for lr decay through config (#5780) * set max_steps for lr decay through config * added warning for optim sched max_steps config option * reverted changes to modelPT and updated megatron_base_model * added the experimental cosine annealing scheduler class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update decay_steps for consine annealing exp class * added copyright --------- Co-authored-by: ANMOL GUPTA <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * Fix transducer and question answering tutorial bugs bugs (#5809) (#5810) Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * update apex install instructions (#5901) (#5902) Signed-off-by: ericharper <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * Hybrid ASR-TTS models (#5659) Add hybrid ASR-TTS models and text-to-text dataset Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * Set providers for ORT inference session (#5903) Signed-off-by: athitten <[email protected]> Signed-off-by: Jason <[email protected]> * [ASR] Configurable metrics for audio-to-audio + removed experimental decorators (#5827) * Added an option to configure metrics for audio-to-audio models Removed experimental decorators Signed-off-by: Ante Jukić <[email protected]> * Addressed review comments Signed-off-by: Ante Jukić <[email protected]> --------- Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Jason <[email protected]> * Correct doc for RNNT transcribe() function (#5904) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Jason <[email protected]> * Add segmentation export to Audacity label file (#5857) * Save the segmentation as label file for Audacity Audacity is a free open source audio editor that can import label file to quickly assess the segmentation quality. This commit add the export to [Audacity label format](https://manual.audacityteam.org/man/importing_and_exporting_labels.html) so that directly after running the segmentation tool the segmentation quality can be assessed or the segmentation can be shared easily. Signed-off-by: CaraDuf <[email protected]> * Fix styling Signed-off-by: CaraDuf <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused score in audacity export score is not written in audacity label file so we can safely not load it from segment. Signed-off-by: CaraDuf <[email protected]> --------- Signed-off-by: CaraDuf <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * Cross-Lingual objectives (XLM) and multilingual (many-many) support for Megatron-NMT (#5026) * Update blendable dataset, and refactor seq2seq data Signed-off-by: MaximumEntropy <[email protected]> * Blendable dataset with binarized mmap working Signed-off-by: MaximumEntropy <[email protected]> * Pass seed from cfg to dataset Signed-off-by: MaximumEntropy <[email protected]> * Fix multilingual setup Signed-off-by: MaximumEntropy <[email protected]> * Add on epoch start reconfiguration Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Update tokenizer creation for multilingual Signed-off-by: MaximumEntropy <[email protected]> * Tmp Signed-off-by: MaximumEntropy <[email protected]> * Update NMT script Signed-off-by: MaximumEntropy <[email protected]> * Remove unused import Signed-off-by: MaximumEntropy <[email protected]> * Update training script Signed-off-by: MaximumEntropy <[email protected]> * Log consumed samples Signed-off-by: MaximumEntropy <[email protected]> * Logging on val epoch end Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Remove redundant print Signed-off-by: MaximumEntropy <[email protected]> * Ckpt averaging for non model parallel megatron models Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Update error message Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Remove check Signed-off-by: MaximumEntropy <[email protected]> * Restore fixes Signed-off-by: MaximumEntropy <[email protected]> * Remove ipdb Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * Move to classmethods Signed-off-by: MaximumEntropy <[email protected]> * Initial Signed-off-by: MaximumEntropy <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * Refactor masking to add skip_masking_id and working xlm bert and t5 datasets Signed-off-by: MaximumEntropy <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Testing a simple solution Signed-off-by: Micha Livne <[email protected]> * 1. Fixed. Seems to work. Need to validate. Signed-off-by: Micha Livne <[email protected]> * 1. Added support in CSV and text memmap toMEgatron encoder-decoder Signed-off-by: Micha Livne <[email protected]> * 1. Added support in CSV. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. 2. Fixed bugs. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed bugs. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Updated yaml. Signed-off-by: Micha Livne <[email protected]> * Minor Signed-off-by: MaximumEntropy <[email protected]> * 1. Fixed warnings. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed a bug. Signed-off-by: Micha Livne <[email protected]> * Tmp Signed-off-by: MaximumEntropy <[email protected]> * Updates Signed-off-by: MaximumEntropy <[email protected]> * Fix minor data things Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * Lang ids for validation datasets Signed-off-by: MaximumEntropy <[email protected]> * More fixes for lang id code at inference Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Remove pdb Signed-off-by: MaximumEntropy <[email protected]> * Fix prepend ID and bleu logging Signed-off-by: MaximumEntropy <[email protected]> * Refactor Signed-off-by: MaximumEntropy <[email protected]> * Fixes for many-many NMT Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Reset o2 default Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore dataset utils Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Allreduce bleu scores Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * 1. Loading index file into memmap object. Signed-off-by: Micha Livne <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed extentin when loading files. Signed-off-by: Micha Livne <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix redundant building Signed-off-by: MaximumEntropy <[email protected]> * PP > 2 for NMT Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Merge and fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * Refactor multilingual again Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor and verify data formats Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup Signed-off-by: MaximumEntropy <[email protected]> * more fixes Signed-off-by: MaximumEntropy <[email protected]> * Fix passing langs Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * More fixes Signed-off-by: MaximumEntropy <[email protected]> * Fixes for bart Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> Signed-off-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Jason <[email protected]> * ONNX export working Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Fixing unit test Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Update isort to the latest version (#5895) Update isort to the latest version Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jason <[email protected]> * Pin isort version (#5914) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jason <[email protected]> * Moved eval notebook data to aws (#5911) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jason <[email protected]> * FilterbankFeaturesTA to match FilterbankFeatures (#5913) Signed-off-by: Mohamed Saad Ibn Seddik <[email protected]> Signed-off-by: Jason <[email protected]> * fixed missing long_description_content_type (#5909) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * added TPMLP for T5-based models (#5840) (#5841) Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * Fixing 0-size issue and ONNX BS>1 trace Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Fixing code scan alert Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * update container (#5917) Signed-off-by: ericharper <[email protected]> Signed-off-by: Jason <[email protected]> * remove conda pynini install (#5921) Signed-off-by: ekmb <[email protected]> Signed-off-by: Jason <[email protected]> * Merge release main (#5916) * update branch Signed-off-by: ericharper <[email protected]> * added TPMLP for T5-based models (#5840) Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> * remove notebook (#5859) Signed-off-by: ericharper <[email protected]> Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Signed-off-by: Jason <[email protected]> * Dynamic freezing in Nemo (#5879) * Initial commit for dynamic freezing logic Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated logic to handle lists and updated docs Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Transferred dynamic freezing logic to core from asr Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert asr config to original Signed-off-by: Daniel Egert <[email protected]> * Fixed tab indent in core.rst Signed-off-by: Daniel Egert <[email protected]> * Updated modelPT for latest from master Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed indents in docs Signed-off-by: Daniel Egert <[email protected]> --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * Fix Windows bug with save_restore_connector (#5919) * Initial commit for Windows bug with save_to Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * add new lannguages to doc (#5939) Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Jason <[email protected]> * Workarounds for ONNX export with autocast Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * fix val loss computation in megatron (#5871) * fix val loss computation in megatron * Fix NaN handling during validation --------- Co-authored-by: ANMOL GUPTA <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * Restoring sigmas Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Add core classes and functions for online clustering diarizer part 2 (#5609) * Add core classes and functions for online clustering diarizer Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add audio to labels code Signed-off-by: Taejin Park <[email protected]> * resolve type errors Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added unit=tests for very short audio Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Filled all missing docstrings Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolved conflict and added missing docstrings Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed unit-test errors Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the wrongly added file - megatron_gpt_model.py Signed-off-by: Taejin Park <[email protected]> * Fix wrongly included file - megatron_gpt_model.py Signed-off-by: Taejin Park <[email protected]> * resolve code quality issue Signed-off-by: Taejin Park <[email protected]> * Fixed unit-test errors and bugs Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changed total_sec for offline_clustering toy_data in unit-tests Signed-off-by: Taejin Park <[email protected]> * fixed merging index offset bug Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * only including part 1 files Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused function Signed-off-by: Taejin Park <[email protected]> * fixed unused imports Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * divided nmesc_clustering.py into two and reflected first-pass comments Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding offline/online_clustering.py Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code QL autocomment Signed-off-by: Taejin Park <[email protected]> * Removed unused imports Signed-off-by: Taejin Park <[email protected]> * Update nemo/collections/asr/parts/utils/online_clustering.py Co-authored-by: Sean Naren <[email protected]> Signed-off-by: Taejin Park <[email protected]> * Reflected comments Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolved code scanning issue Signed-off-by: Taejin Park <[email protected]> * Adding online_diarizer.py Signed-off-by: Taejin Park <[email protected]> * updated tests and speaker_utils Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the wrong test eval Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating online diarizer for varialbe name change Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reflected comments and some typo fixes in speaker_utils Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Sean Naren <[email protected]> Signed-off-by: Jason <[email protected]> * Distributed Adam optimizer overlaps param all-gather with forward compute (#5684) * Add distopt support for overlapping param all-gather with forward compute Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * [TTS][ZH] added new NGC model cards with polyphone disambiguation. (#5940) * [TTS][ZH] added new NGC model cards with polyphone disambiguation. Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * Moved truncation of context higher up Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * [TN] bugfix file handler is not closed. (#5955) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * Added unit test for regulate_len. Unscripted sort_tensor for TRT Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Fixed slice Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * [TTS] deprecate AudioToCharWithPriorAndPitchDataset. (#5959) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * bugfix: file handlers are not closed. (#5956) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * [TTS][G2P] deprecate add_symbols (#5961) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * fix broken link (#5968) Signed-off-by: ericharper <[email protected]> Signed-off-by: Jason <[email protected]> * Fix hybridasr bug (#5950) (#5957) Signed-off-by: Jason <[email protected]> * Added list_available_models (#5967) * Added list_available_models Signed-off-by: Evgeniy Shabalin <[email protected]> * Added to readme Signed-off-by: Evgeniy Shabalin <[email protected]> * added vits to docs Signed-off-by: Evgeniy Shabalin <[email protected]> * added vits to docs Signed-off-by: Evgeniy Shabalin <[email protected]> --------- Signed-off-by: Evgeniy Shabalin <[email protected]> Signed-off-by: Evgeniy Shabalin <[email protected]> Signed-off-by: Jason <[email protected]> * Move settings to `pyproject.toml`. Remove deprecated `pytest-runner` (#5947) * Move project settings to pyproject.toml Signed-off-by: Vladimir Bataev <[email protected]> * Remove setup.cfg Signed-off-by: Vladimir Bataev <[email protected]> * Remove deprecated pytest-runner Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Allow only registered markers for pytest Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jason <[email protected]> * Fix torchaudio installation (#5850) * Fail if torchaudio not installed Signed-off-by: Vladimir Bataev <[email protected]> * Fix torchaudio matching version Signed-off-by: Vladimir Bataev <[email protected]> * Warn if Pytorch major version changed Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jason <[email protected]> * Update fastpitch.py (#5969) Signed-off-by: Jason <[email protected]> * Review comments Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * per-micro-batch input loader (#5635) * per-micro-batch input loader * per-micro-batch input loader set arg default val * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix * apply per-microbatch-loader to only GPT * update docstring on micro-batch input loader * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed the default arg val * fix batch size to 1 at log stat registration * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update container for CI Signed-off-by: ericharper <[email protected]> * update container in jenkinsfile Signed-off-by: ericharper <[email protected]> * update container for CI Signed-off-by: ericharper <[email protected]> fix merge conflict * revert Jenkinsfile * Revert "revert Jenkinsfile" This reverts commit d23b7757e0f935dacde2840f234193c632a2b3be. * Update nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py Signed-off-by: Tim Moon <[email protected]> * add GradScaler * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ericharper <[email protected]> Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Jason <[email protected]> * update container in readme (#5981) Signed-off-by: fayejf <[email protected]> Signed-off-by: Jason <[email protected]> * Support Alignment Extraction for all RNNT Beam decoding methods (#5925) * Partial impl of ALSD alignment extraction Signed-off-by: smajumdar <[email protected]> * Partial impl of ALSD alignment extraction Signed-off-by: smajumdar <[email protected]> * Remove everything else Signed-off-by: smajumdar <[email protected]> * Support dataclass in AbstractRNNTDecoding Signed-off-by: smajumdar <[email protected]> * Add first draft unittest Signed-off-by: smajumdar <[email protected]> * Correct the logic to more to the next timestep in the alignment Signed-off-by: smajumdar <[email protected]> * Finalize ALSD alignment generation Signed-off-by: smajumdar <[email protected]> * Add support for TSD greedy alignment extraction Signed-off-by: smajumdar <[email protected]> * Add support for mAES greedy alignment extraction Signed-off-by: smajumdar <[email protected]> * Finalize extraction of alignments from all beam algorithms for RNNT Signed-off-by: smajumdar <[email protected]> * Style fixes Signed-off-by: smajumdar <[email protected]> * Add copyright Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Jason <[email protected]> * Add AWS SageMaker ASR Examples (#5638) * Base code for AWS SageMaker example Signed-off-by: SeanNaren <[email protected]> * Remove format Signed-off-by: SeanNaren <[email protected]> * wrap Signed-off-by: SeanNaren <[email protected]> * Add a notebook with the code Signed-off-by: SeanNaren <[email protected]> * Setup Signed-off-by: SeanNaren <[email protected]> * Update notebook Signed-off-by: SeanNaren <[email protected]> * Remove space Signed-off-by: SeanNaren <[email protected]> * Fix spelling mistake Signed-off-by: SeanNaren <[email protected]> * Add message to explain usage Signed-off-by: SeanNaren <[email protected]> * Add CommonVoice esperanto example Signed-off-by: SeanNaren <[email protected]> * Fix path Signed-off-by: SeanNaren <[email protected]> * Fixes Signed-off-by: SeanNaren <[email protected]> * Import sox locally, add documentation Signed-off-by: SeanNaren <[email protected]> * Address reviews Signed-off-by: SeanNaren <[email protected]> * Address reviews Signed-off-by: SeanNaren <[email protected]> * Address reviews Signed-off-by: SeanNaren <[email protected]> * Add cell to download the SSL model Signed-off-by: SeanNaren <[email protected]> * Set max epochs to 300 Signed-off-by: SeanNaren <[email protected]> * Fixes, introduce HF dataset instructions Signed-off-by: SeanNaren <[email protected]> * Upstream updates from other branch Signed-off-by: SeanNaren <[email protected]> * Fix warning Signed-off-by: SeanNaren <[email protected]> * Add README, add image Signed-off-by: SeanNaren <[email protected]> * Fix warning Signed-off-by: SeanNaren <[email protected]> * Address feedback Signed-off-by: SeanNaren <[email protected]> * Feedback Signed-off-by: SeanNaren <[email protected]> --------- Signed-off-by: SeanNaren <[email protected]> Signed-off-by: Jason <[email protected]> * Update PUBLICATIONS.md (#5963) * Add papers from 2022/2022 to PUBLICATIONS.md Signed-off-by: smajumdar <[email protected]> * Remove ipynb from being tracked as for nemo code library Signed-off-by: smajumdar <[email protected]> * Remove ipynb from being tracked as for nemo code library Signed-off-by: smajumdar <[email protected]> * Add additional papers Signed-off-by: smajumdar <[email protected]> --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Jason <[email protected]> * [G2P] fixed typos and broken import library. (#5978) (#5979) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * [G2P] added backward compatibility for english tokenizer and fixed unit tests (#5980) (#5984) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Micha Livne <[email protected]> Signed-off-by: Jason <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: nithinraok <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Yi Dong <[email protected]> Signed-off-by: Roman Korostik <[email protected]> Signed-off-by: Roman Korostik <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Jean-Louis Queguiner <[email protected]> Signed-off-by: ekmb <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: SeanNaren <[email protected]> Signed-off-by: gabitza-tech <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: athitten <[email protected]> Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: CaraDuf <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> Signed-off-by: Mohamed Saad Ibn Seddik <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Evgeniy Shabalin <[email protected]> Signed-off-by: Evgeniy Shabalin <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Matvei Novikov <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Roman Korostik <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Jean-Louis Queguiner <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Mikyas Desta <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Gabriel Pirlogeanu <[email protected]> Co-authored-by: anmolgupt <[email protected]> Co-authored-by: ANMOL GUPTA <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: athitten <[email protected]> Co-authored-by: anteju <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: CaraDuf <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Mohamed Saad Ibn Seddik <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: trias702 <[email protected]> Co-authored-by: Daniel Egert <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]> Co-authored-by: Evgeniy Shabalin <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: Sangkug Lym <[email protected]>
- Loading branch information