Skip to content

NVIDIA Neural Modules 1.14.0

Compare
Choose a tag to compare
@ericharper ericharper released this 24 Dec 02:49
· 2809 commits to main since this release

Highlights

NeMo ASR

  • Hybrid CTC + Transducer loss ASR #5364
  • Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
  • ASR Adapters hyper parameter search scripts #5159
  • RNNT {ONNX, TorchScript} x GPU export infer #5248
  • Exportable MelSpectrogram (TorchScript) #5512
  • Audio To Audio Dataset Processor #5196
  • Multi Channel Audio Transcription #5479
  • Silence Augmentation #5476

NeMo Megatron

  • Support for the Mixture of Experts for T5
  • Fix PTL model size output for GPT-3 and BERT
  • BERT with Tensor Parallelism & Pipeline Parallel Support

NeMo Core

  • Hydra Multirun core support + NeMo HP optim in YAML #5159

NeMo Models

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.11

ASR

Changelog
  • [Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
  • Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
  • Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
  • Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
  • Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
  • bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
  • Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
  • Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
  • Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
  • Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
  • [ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
  • Add Silence Augmentation by @fayejf :: PR: #5476
  • add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
  • add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
  • [ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
  • Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
  • Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
  • Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
  • Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
  • Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
  • Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
  • [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639

TTS

Changelog
  • [TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
  • [TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
  • [TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
  • [TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
  • [TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
  • [TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
  • Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
  • [TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
  • [TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
  • [TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
  • [TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
  • [TTS] Add Spanish model documentation by @rlangman :: PR: #5390
  • [TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
  • [TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
  • [TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
  • [TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
  • [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
  • JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
  • [TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
  • TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
  • [TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
  • [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
  • [TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
  • [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
  • [TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
  • [TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog
  • [ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
  • [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

Export

Changelog

General Improvements

Changelog