Skip to content

Commit

Permalink
Merge branch 'main' into cherry-pick-main-68cd1a75a04cd34fdbb89b1c663…
Browse files Browse the repository at this point in the history
…ef10ff351eca6

Signed-off-by: Boris Fomitchev <[email protected]>
  • Loading branch information
borisfom authored Nov 22, 2022
2 parents 5a8704e + ed87156 commit 88b7573
Show file tree
Hide file tree
Showing 78 changed files with 2,774 additions and 3,999 deletions.
74 changes: 74 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
push:
branches: [ "main", "[rv][0-9]*", "gh-pages-src" ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ "main" ]
schedule:
- cron: '19 1 * * 4'

jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write

strategy:
fail-fast: false
matrix:
language: [ 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support

steps:
- name: Checkout repository
uses: actions/checkout@v3

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.

# Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
queries: security-and-quality # security-extended,


# Autobuild attempts to build any compiled languages (C/C++, C#, Go, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2

# ℹ️ Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun

# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.

# - run: |
# echo "Run, Build Application using script"
# ./location_of_script_within_repo/buildscript.sh

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"
37 changes: 34 additions & 3 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -4206,7 +4206,9 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
validation_datasets=/home/TestData/an4_dataset/an4_val.json \
sup_data_path=/home/TestData/an4_dataset/beta_priors \
trainer.devices="[0]" \
+trainer.limit_train_batches=1 +trainer.limit_val_batches=1 trainer.max_epochs=1 \
+trainer.limit_train_batches=1 \
+trainer.limit_val_batches=1 \
trainer.max_epochs=1 \
trainer.strategy=null \
model.pitch_mean=212.35873413085938 \
model.pitch_std=68.52806091308594 \
Expand All @@ -4224,14 +4226,41 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
~model.text_normalizer_call_kwargs'
}
}
stage('RADTTS') {
steps {
sh 'python examples/tts/radtts.py \
train_dataset=/home/TestData/an4_dataset/an4_train.json \
validation_datasets=/home/TestData/an4_dataset/an4_val.json \
sup_data_path=/home/TestData/an4_dataset/radtts_beta_priors \
trainer.devices="[0]" \
+trainer.limit_train_batches=1 \
+trainer.limit_val_batches=1 \
trainer.max_epochs=1 \
trainer.strategy=null \
model.pitch_mean=212.35873413085938 \
model.pitch_std=68.52806091308594 \
model.train_ds.dataloader_params.batch_size=4 \
model.train_ds.dataloader_params.num_workers=0 \
model.validation_ds.dataloader_params.batch_size=4 \
model.validation_ds.dataloader_params.num_workers=0 \
export_dir=/home/TestData/radtts_test \
model.optim.lr=0.0001 \
model.modelConfig.decoder_use_partial_padding=True \
~trainer.check_val_every_n_epoch \
~model.text_normalizer \
~model.text_normalizer_call_kwargs'
}
}
stage('Mixer-TTS') {
steps {
sh 'python examples/tts/mixer_tts.py \
train_dataset=/home/TestData/an4_dataset/an4_train.json \
validation_datasets=/home/TestData/an4_dataset/an4_val.json \
sup_data_path=/home/TestData/an4_dataset/sup_data \
trainer.devices="[0]" \
+trainer.limit_train_batches=1 +trainer.limit_val_batches=1 trainer.max_epochs=1 \
+trainer.limit_train_batches=1 \
+trainer.limit_val_batches=1 \
trainer.max_epochs=1 \
trainer.strategy=null \
model.pitch_mean=212.35873413085938 \
model.pitch_std=68.52806091308594 \
Expand All @@ -4250,7 +4279,9 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
train_dataset=/home/TestData/an4_dataset/an4_train.json \
validation_datasets=/home/TestData/an4_dataset/an4_val.json \
trainer.devices="[0]" \
+trainer.limit_train_batches=1 +trainer.limit_val_batches=1 +trainer.max_epochs=1 \
+trainer.limit_train_batches=1 \
+trainer.limit_val_batches=1 \
+trainer.max_epochs=1 \
trainer.strategy=null \
model.train_ds.dataloader_params.batch_size=4 \
model.train_ds.dataloader_params.num_workers=0 \
Expand Down
4 changes: 2 additions & 2 deletions docs/source/asr/examples/kinyarwanda_asr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -483,7 +483,7 @@ The figure below shows the training dynamics when we train Kinyarwanda models **
.. image:: ../images/kinyarwanda_from_scratch.png
:align: center
:alt: Training dynamics of Kinyarwanda models trained from scratch
:scale: 50%
:width: 800px

Finetuning from another model
#############################
Expand Down Expand Up @@ -530,7 +530,7 @@ The figure below compares the training dynamics for three Conformer-Transducer m
.. image:: ../images/kinyarwanda_finetuning.png
:align: center
:alt: Training dynamics of Kinyarwanda models trained from scratch and finetuned from different pretrained checkpoints
:scale: 50%
:width: 800px

************************
Inference and evaluation
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ NVIDIA NeMo User Guide


.. toctree::
:maxdepth: 2
:maxdepth: 3
:caption: Tools
:name: Tools

Expand Down
1 change: 1 addition & 0 deletions docs/source/nlp/dialogue.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ In particular, we wanted to decouple the task-dependent, model-independent compo

.. image:: dialogue_UML.png
:alt: Dialogue-UML
:width: 800px

**Supported Tasks**

Expand Down
1 change: 1 addition & 0 deletions docs/source/nlp/entity_linking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ be used to build a knowledge base embedding index.

.. image:: https://github.com/NVIDIA/NeMo/blob/entity-linking-documentation/docs/source/nlp/entity_linking_overview.jpg
:alt: Entity-Linking-Overview
:width: 800px

Our BERT-base + Self Alignment Pretraining implementation allows you to train an entity linking encoder. We also provide example code
on building an index with `Medical UMLS <https://www.nlm.nih.gov/research/umls/index.html>`_ concepts `NeMo/examples/nlp/entity_linking/build_index.py <https://github.com/NVIDIA/NeMo/tree/stable/examples/nlp/entity_linking/build_index.py>`__.
Expand Down
5 changes: 5 additions & 0 deletions docs/source/nlp/nemo_megatron/parallelisms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Distributed Data parallelism

.. image:: images/ddp.gif
:align: center
:width: 800px
:alt: Distributed Data Parallel


Expand All @@ -18,20 +19,23 @@ Tensor Parallelism

.. image:: images/tp.gif
:align: center
:width: 800px
:alt: Tensor Parallel

Pipeline Parallelism
^^^^^^^^^^^^^^^^^^^^

.. image:: images/pp.gif
:align: center
:width: 800px
:alt: Pipeline Parallel

Sequence Parallelism
^^^^^^^^^^^^^^^^^^^^

.. image:: images/sp.gif
:align: center
:width: 800px
:alt: Sqeuence Parallel

Parallelism nomenclature
Expand All @@ -41,4 +45,5 @@ When reading and modifying NeMo Megatron code you will encounter the following t

.. image:: images/pnom.gif
:align: center
:width: 800px
:alt: Parallelism nomenclature
1 change: 1 addition & 0 deletions docs/source/nlp/question_answering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ Similarly, the BaseQAModel module handles common model tasks like creating datal

.. image:: question_answering_arch.png
:alt: Question-Answerin-Architecture
:width: 800px

Configuration
=============
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -182,4 +182,4 @@ References
.. bibliography:: ../tn_itn_all.bib
:style: plain
:labelprefix: TEXTPROCESSING-NORM
:keyprefix: textprocessing-norm-
:keyprefix: textprocessing-norm-
Original file line number Diff line number Diff line change
Expand Up @@ -96,4 +96,4 @@ References
.. bibliography:: ../tn_itn_all.bib
:style: plain
:labelprefix: TEXTPROCESSING-DEPLOYMENT
:keyprefix: textprocessing-deployment-
:keyprefix: textprocessing-deployment-
153 changes: 153 additions & 0 deletions docs/source/tools/comparison_tool.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
Comparison tool for ASR Models
==============================

The Comparison Tool (CT) allows to compare predictions of different ASR models at word accuracy level.

+--------------------------------------------------------------------------------------------------------------------------+
| **Comparison tool features:** |
+--------------------------------------------------------------------------------------------------------------------------+
| navigation across dataset's vocabulary using an interactive datatable that supports sorting and filtering |
+--------------------------------------------------------------------------------------------------------------------------+
| interactive visualization of model's accuracy |
+--------------------------------------------------------------------------------------------------------------------------+
| visual comparison of predictions of different models |
+--------------------------------------------------------------------------------------------------------------------------+

Getting Started
---------------
The Comparison Tool is integrated in NeMo Speech Data Explorer (SDE) that could be found at `NeMo/tools/speech_data_explorer <https://github.com/NVIDIA/NeMo/tree/main/tools/speech_data_explorer>`__.

Please install the SDE requirements:

.. code-block:: bash
pip install -r tools/speech_data_explorer/requirements.txt
Then run:

.. code-block:: bash
python tools/speech_data_explorer/data_explorer.py -h
usage: data_explorer.py [-h] [--vocab VOCAB] [--port PORT] [--disable-caching-metrics] [--estimate-audio-metrics] [--debug] manifest
Speech Data Explorer
positional arguments:
manifest path to JSON manifest file
optional arguments:
-h, --help show this help message and exit
--vocab VOCAB optional vocabulary to highlight OOV words
--port PORT serving port for establishing connection
--disable-caching-metrics
disable caching metrics for errors analysis
--estimate-audio-metrics, -a
estimate frequency bandwidth and signal level of audio recordings
--debug, -d enable debug mode
--audio-base-path A base path for the relative paths in manifest. It defaults to manifest path.
--names_compared, -nc names of the two fields that will be compared, example: pred_text_contextnet pred_text_conformer.
--show_statistics, -shst field name for which you want to see statistics (optional). Example: pred_text_contextnet.
CT takes a JSON manifest file (that describes speech datasets in NeMo) as an input. It should contain the following fields:
* `audio_filepath` (path to audio file)
* `duration` (duration of the audio file in seconds)
* `text` (reference transcript)
* `pred_text_<model_1_name>`
* `pred_text_<model_2_name>`
SDE supports any extra custom fields in the JSON manifest. If the field is numeric, then SDE can visualize its distribution across utterances.
JSON manifest has attribute `pred_text`, SDE interprets it as a predicted ASR transcript and computes error analysis metrics.
If you want SDE to analyse another prediction field, then please use `--show_statistics` argument.
User Interface
--------------
SDE has three pages if `--names_compared` argument is not empty:
* `Statistics` (to display global statistics and aggregated error metrics)
.. image:: images/sde_base_stats.png
:align: center
:width: 800px
:alt: SDE Statistics
* `Samples` (to allow navigation across the entire dataset and exploration of individual utterances)
.. image:: images/sde_player.png
:align: center
:width: 800px
:alt: SDE Statistics
* `Comparison tool` (to explore predictions at word level)
.. image:: images/scrsh_2.png
:align: center
:width: 800px
:alt: Comparison tool
CT has an interactive datatable for dataset's vocabulary (that supports navigation, filtering, and sorting):
* Data (that visualizes all dataset's words and adds each one's accuracy)
.. image:: images/scrsh_3.png
:align: center
:width: 800px
:alt: Data
CT supports all operations, that present in SDE, and allows combination of filtering expressions with "or" and "and" operations
* filtering (by entering a filtering expression in a cell below the header's cell)
.. image:: images/scrsh_4.png
:align: center
:width: 800px
:alt: Filtering
Analysis of Speech Datasets
---------------------------
If there is a pre-trained ASR model, then the JSON manifest file can be extended with ASR predicted transcripts:
.. code-block:: bash
python examples/asr/transcribe_speech.py pretrained_name=<ASR_MODEL_NAME> dataset_manifest=<JSON_FILENAME> append_pred=False pred_name_postfix=<model_name_1>
More information about transcribe_speech parameters is available in the code: `NeMo/examples/asr/transcribe_speech.py <https://github.com/NVIDIA/NeMo/blob/main/examples/asr/transcribe_speech.py>`__.
.
.. image:: images/scrsh_2.png
:align: center
:width: 800px
:alt: fields
Fields 1 and 2 are responsible for what will be displayed on the horizontal and vertical axes.
Fields 3 and 4 allow you to convert any available numeric parameter into color and size, respectively.
Fields 5 and 6 are responsible for point spacing. Some data points might have the same coordinates on both axes, in which case there will be an overlap, and in order to be able to explore each point, the option for their spreading was added.
.. image:: images/scrsh_5.png
:align: center
:width: 800px
:alt: dot spacing
Point spacing works as follows: a small random value is added to all point coordinates, the value of which is limited by the "radius" parameter, which can be set manually.
.. image:: images/scrsh_9.png
:align: center
:width: 800px
:alt: Example
In this case, all points lying above the diagonal have higher accuracy with the model displayed on the vertical axis, and all points below the diagonal were recognized better with the model displayed on the horizontal axis.
Points marked with circles should be explored first.
Words in the first quarter were well recognized by both models, and conversely, words in the third quarter were poorly recognized by both models.
Binary file added docs/source/tools/images/scrsh_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/tools/images/scrsh_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/tools/images/scrsh_4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/tools/images/scrsh_5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/tools/images/scrsh_9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 88b7573

Please sign in to comment.