Highlights

NeMo ASR

Hybrid CTC + Transducer loss ASR #5364
Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
ASR Adapters hyper parameter search scripts #5159
RNNT {ONNX, TorchScript} x GPU export infer #5248
Exportable MelSpectrogram (TorchScript) #5512
Audio To Audio Dataset Processor #5196
Multi Channel Audio Transcription #5479
Silence Augmentation #5476

NeMo Megatron

Support for the Mixture of Experts for T5
Fix PTL model size output for GPT-3 and BERT
BERT with Tensor Parallelism & Pipeline Parallel Support

NeMo Core

Hydra Multirun core support + NeMo HP optim in YAML #5159

NeMo Models

TTS Zh Fastpitch HifiGan SFSpeech

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.11

ASR

Changelog

[Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
[ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
Add Silence Augmentation by @fayejf :: PR: #5476
add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
[ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
[STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639

TTS

Changelog

[TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
[TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
[TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
[TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
[TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
[TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
[TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
[TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
[TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
[TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
[TTS] Add Spanish model documentation by @rlangman :: PR: #5390
[TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
[TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
[TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
[TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
[TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
[TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
[TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
[TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
[TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
[TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
[TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
[TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666

NLP / NMT

Changelog

Option to pad the last validation input sequence if its smaller than the encoder sequence length for MegatronGPT by @anmolgupt :: PR: #5243
Fixes bugs with loss averaging with for Megatron GPT by @shanmugamr1992 :: PR: #5329
Fixing bug in Megatron BERT when loss mask is all zeros by @shanmugamr1992 :: PR: #5424
support to disable sequence length + 1 input tokens for each sample in MegatronGPT by @anmolgupt :: PR: #5363
[TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414
Bug fix/gpt by @shanmugamr1992 :: PR: #5493
prompt tuning fix for unscale grad errors by @arendu :: PR: #5523
Bert sequence parallel support by @shanmugamr1992 :: PR: #5494
NLP docs fixes by @vsl9 :: PR: #5528
Switch order of args in optimizer_step override by @ericharper :: PR: #5549
Upgrade to 22.11 by @ericharper :: PR: #5550
Merge r1.13.0 main by @ericharper :: PR: #5570
some tokenizers do not have additional_special_tokens_ids attribute by @arendu :: PR: #5642
Remove cell output from tutorial by @ericharper :: PR: #5689

Text Normalization / Inverse Text Normalization

Changelog

[ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
[TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

Export

Changelog

Fixed the onnx bug in conformer for non-streaming models. by @VahidooX :: PR: #5242
Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
Fixes for Conformer-xl export by @borisfom :: PR: #5309
Remove onnx graphsurgery from Dockerfile by @titu1994 :: PR: #5320
add exportable mel spec by @1-800-BAD-CODE :: PR: #5512

General Improvements

Changelog

bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
Fix setting up of learning rate scheduler by @PeganovAnton :: PR: #5444
Better patch hydra by @titu1994 :: PR: #5591
[TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
Add fully torch.jit.script-able speaker clustering module by @tango4j :: PR: #5191
Update perturb.py by @stevehuang52 :: PR: #5231
remove CV requirements. by @XuesongYang :: PR: #5233
checks for accepted adapter type at module level by @arendu :: PR: #5194
fix hypotheses return by @nithinraok :: PR: #5253
Support for inserting additional subsampling in conformer encoder by @shan18 :: PR: #5224
update tutorials to use meeting config as default and VAD by @nithinraok :: PR: #5237
Specifying audio signal dropout separately for the Conformer Encoder by @shan18 :: PR: #5263
created by @bmwshop :: PR: #5268
Fix failing speaker counting for short audio samples by @tango4j :: PR: #5267
O2bert + apex pipeline functions by @shanmugamr1992 :: PR: #5221
Upperbound PTL by @titu1994 :: PR: #5302
Update Interface(s) phonetic entry by @blisc :: PR: #5212
add label inference support to EncDecSpeakerLabel class by @nithinraok :: PR: #5278
Add italian model checkpoints by @Kipok :: PR: #5315
Text Memmap Parsing Improvements by @michalivne :: PR: #5265
Update librosa signature in HF processing script by @titu1994 :: PR: #5321
Force wav file format for audio_filepath by @titu1994 :: PR: #5323
Updates to T0 Dataset and Model by @MaximumEntropy :: PR: #5201
[DOC] add sphinx-copybutton requirement to copy button on code snippets. by @XuesongYang :: PR: #5326
Add support for Hydra multirun to NeMo by @titu1994 :: PR: #5159
typo fix by @arendu :: PR: #5328
add precommit hood to automatic sort entries in requirements. by @XuesongYang :: PR: #5333
Add speaker clustering arguments to forward function by @tango4j :: PR: #5306
Fixing de-autocast by @borisfom :: PR: #5319
[Bugfix] Added rm -f / wget- nc command to avoid bash error in multispeaker sim notebook by @tango4j :: PR: #5292
[DOC] added ipython dependency to support IPython.sphinxext extension by @XuesongYang :: PR: #5345
Bug fix (removing old compute consumed samples) by @shanmugamr1992 :: PR: #5355
removed uninstall nemo_cv and nemo_simple_gan and relax numba version… by @XuesongYang :: PR: #5332
Enable mlflow logger by @whrichd :: PR: #4893
Fix Python type hints according to Python Docs by @artbataev :: PR: #5370
Distributed optimizer support for BERT by @timmoon10 :: PR: #5305
SpeakerClustering: fix tensor dimennsions in forward() by @virajkarandikar :: PR: #5387
add squad by @arendu :: PR: #5407
added python and c++ alignment code by @yzhang123 :: PR: #5346
Add MoE support for T5 model (w/o expert parallel) by @aklife97 :: PR: #5409
Fix for concat map dataset by @1-800-BAD-CODE :: PR: #5133
Support for finetuning and finetuning inference with .ckpt files & batch size refactoring by @MaximumEntropy :: PR: #5339
update doc in terms of get_label for lang id model by @fayejf :: PR: #5366
Debug support for interleaved pipeline parallelism with the distributed Adam optimizer by @timmoon10 :: PR: #5236
Create codeql.yml by @titu1994 :: PR: #5445
Update codeql.yml by @titu1994 :: PR: #5449
Fix support for legacy sentencepiece models by @Numeri :: PR: #5406
Update docs with Comparison tool info, and slightly change .sh for ea… by @Jorjeous :: PR: #5182
Add float32 type casting for get_samples function by @tango4j :: PR: #5399
Add missing import in transcribe_utils.py by @jonghwanhyeon :: PR: #5487
Add auto-labeler by @SeanNaren :: PR: #5498
Add more glob patterns for labeler by @SeanNaren :: PR: #5504
Fix issues with PL 1.8 by @SeanNaren :: PR: #5353
[BugFix] Removing tokens from decoding timestamp by @tango4j :: PR: #5481
Upperbound the torchmetrics version by @SeanNaren :: PR: #5537
Data parallel collect results by @michalivne :: PR: #5547
Fix log-rank-0-only logic by @mikolajblaz :: PR: #5555
Fixed Docker build by @borisfom :: PR: #5562
Patch hydra launch by @titu1994 :: PR: #5589
Fix race condition bug with hydra multirun by @titu1994 :: PR: #5594
Update Dockerfile to use numba==0.53.1 by @stevehuang52 :: PR: #5614
Fixed a missing import for gather_objects by @michalivne :: PR: #5622

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Neural Modules 1.14.0

Highlights

NeMo ASR

NeMo Megatron

NeMo Core

NeMo Models

Detailed Changelogs

Container

ASR

TTS

NLP / NMT

Text Normalization / Inverse Text Normalization

Export

General Improvements

Contributors