Skip to content

Commit

Permalink
Merge r1.7.0 main (#3773)
Browse files Browse the repository at this point in the history
* Tn bug 1.7.0 (#3730)

* fix es and fr bug

Signed-off-by: Yang Zhang <[email protected]>

* add file

Signed-off-by: Yang Zhang <[email protected]>

* [TTS] Fix bugs in E2E TTS, Mixer-TTS and FastPitch (#3740)

* fix bugs

Signed-off-by: Oktai Tatanov <[email protected]>

* fix bug in e2e tts and mixer tts

Signed-off-by: Oktai Tatanov <[email protected]>

* Mirror AN4 data while servers are down (#3743)

Signed-off-by: smajumdar <[email protected]>

* Bugfix for GPT eval  (#3744)

* use tokens_cut not tokens

Signed-off-by: ericharper <[email protected]>

* remove precision conversion and comment jit for bias gelu

Signed-off-by: ericharper <[email protected]>

* revert comment update mbs in config

Signed-off-by: ericharper <[email protected]>

* calculate micro_batch_size during complete and compute_logprobs

Signed-off-by: ericharper <[email protected]>

* ASR SSL update (#3746)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* Fix SSL configs for 1.7 (#3748)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* punct process bug fix (#3747)

Signed-off-by: ekmb <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>

* updated conformer models. (#3741)

Signed-off-by: Vahid <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>

* Yuya/megatron t5 glue eval (#3751)

* Add megatron t5 glue eval-only script

Signed-off-by: Yu Yao <[email protected]>

* Update megatron t5 glue eval default configs

Signed-off-by: Yu Yao <[email protected]>

* Update megatron t5 glue eval configs

Signed-off-by: Yu Yao <[email protected]>

* Update config comments

Signed-off-by: Yu Yao <[email protected]>

Co-authored-by: Yu Yao <[email protected]>

* Specify gpus in SSL notebook (#3753)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* specify gpus

Signed-off-by: sam1373 <[email protected]>

* Duplex model inference fix, money encoder fix (#3754)

Signed-off-by: ekmb <[email protected]>

* Update docs for RNNT and overriding fused batch size (#3755)

Signed-off-by: smajumdar <[email protected]>

* fix consumed samples calculation + PTune Model bugs (#3738)

* fix the way computing consumed samples

Signed-off-by: Yi Dong <[email protected]>

* fixed ptune model

Signed-off-by: Yi Dong <[email protected]>

* make sure notebook is working

Signed-off-by: Yi Dong <[email protected]>

* added try-catch

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* fix directories in ssl notebook (#3758)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* specify gpus

Signed-off-by: sam1373 <[email protected]>

* update dirs

Signed-off-by: sam1373 <[email protected]>

* TN docs update (#3735)

* TN docs update: audio based docs added, quick start, ref fixed, etc

Signed-off-by: ekmb <[email protected]>

* add deployment script dir and Sp TN

Signed-off-by: ekmb <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>

* Update Tacotron2_Training.ipynb (#3769)

Signed-off-by: Jason <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* update requirements and package info

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* remove unused import

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Yu Yao <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Jason <[email protected]>
  • Loading branch information
12 people authored Mar 1, 2022
1 parent 222b513 commit 063d349
Show file tree
Hide file tree
Showing 63 changed files with 819 additions and 317 deletions.
9 changes: 8 additions & 1 deletion docs/source/asr/configs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -671,9 +671,16 @@ The most important component at the top level is the ``strategy``. It can take o
decoding:
strategy: "greedy_batch"
# preserve decoding alignments
preserve_alignments: false
# Overrides the fused batch size after training.
# Setting it to -1 will process whole batch at once when combined with `greedy_batch` decoding strategy
fused_batch_size: Optional[int] = -1
# greedy strategy config
greedy:
max_symbols: 30
max_symbols: 10
# beam strategy config
beam:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@
bibtex_bibfiles = [
'asr/asr_all.bib',
'nlp/nlp_all.bib',
'nlp/text_normalization/tn_itn_all.bib',
'tools/tools_all.bib',
'nemo_text_processing/textprocessing_all.bib',
'tts_all.bib',
]

Expand Down
13 changes: 4 additions & 9 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,14 @@ NVIDIA NeMo User Guide
asr/speaker_diarization/intro

.. toctree::
:maxdepth: 2
:maxdepth: 3
:caption: Natural Language Processing
:name: Natural Language Processing

nlp/megatron

nlp/models
nlp/megatron
nlp/api
nlp/text_normalization/intro

.. toctree::
:maxdepth: 2
Expand All @@ -55,12 +56,6 @@ NVIDIA NeMo User Guide

common/intro

.. toctree::
:maxdepth: 2
:caption: Text Processing
:name: Text Processing

nemo_text_processing/intro

.. toctree::
:maxdepth: 2
Expand Down
17 changes: 0 additions & 17 deletions docs/source/nemo_text_processing/intro.rst

This file was deleted.

78 changes: 0 additions & 78 deletions docs/source/nemo_text_processing/text_normalization.rst

This file was deleted.

1 change: 0 additions & 1 deletion docs/source/nlp/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,3 @@ NeMo's NLP collection supports provides the following task-specific models:
entity_linking
nlp_model
machine_translation
text_normalization
21 changes: 21 additions & 0 deletions docs/source/nlp/text_normalization/intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
(Inverse) Text Normalization
============================

NeMo supports Text Normalization (TN) and Inverse Text Normalization (ITN) tasks via rule-based `nemo_text_processing` python package and Neural-based TN/ITN model.

Rule-based (WFST) TN/ITN:

.. toctree::
:maxdepth: 1

wfst/intro


Neural TN/ITN:

.. toctree::
:maxdepth: 1

nn_text_normalization


Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _text_normalization:
.. _nn_text_normalization:

Text Normalization Models
==========================
Neural Text Normalization Models
================================
Text normalization is the task of converting a written text into its spoken form. For example,
``$123`` should be verbalized as ``one hundred twenty three dollars``, while ``123 King Ave``
should be verbalized as ``one twenty three King Avenue``. At the same time, the inverse problem
Expand Down Expand Up @@ -279,7 +279,7 @@ The argument ``data.train_ds.decoder_data_augmentation`` in the config file cont
References
----------

.. bibliography:: nlp_all.bib
.. bibliography:: tn_itn_all.bib
:style: plain
:labelprefix: NLP-TEXTNORM
:keyprefix: nlp-textnorm-
56 changes: 56 additions & 0 deletions docs/source/nlp/text_normalization/tn_itn_all.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
@article{ebden2015kestrel,
title={The Kestrel TTS text normalization system},
author={Ebden, Peter and Sproat, Richard},
journal={Natural Language Engineering},
volume={21},
number={3},
pages={333},
year={2015},
publisher={Cambridge University Press}
}

@article{sproat2016rnn,
title={RNN approaches to text normalization: A challenge},
author={Sproat, Richard and Jaitly, Navdeep},
journal={arXiv preprint arXiv:1611.00068},
year={2016}
}

@book{taylor2009text,
title={Text-to-speech synthesis},
author={Taylor, Paul},
year={2009},
publisher={Cambridge university press}
}

@misc{zhang2021nemo,
title={NeMo Inverse Text Normalization: From Development To Production},
author={Yang Zhang and Evelina Bakhturina and Kyle Gorman and Boris Ginsburg},
year={2021},
eprint={2104.05055},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

@inproceedings{sparrowhawk,
title = {TTS for Low Resource Languages: A Bangla Synthesizer},
author = {Alexander Gutkin and Linne Ha and Martin Jansche and Knot Pipatsrisawat and Richard Sproat},
booktitle = {10th Language Resources and Evaluation Conference},
year = {2016},
}

@article{mohri2005weighted,
title={Weighted automata in text and speech processing},
author={Mohri, Mehryar and Pereira, Fernando and Riley, Michael},
journal={arXiv preprint cs/0503077},
year={2005}
}

@incollection{mohri2009weighted,
title={Weighted automata algorithms},
author={Mohri, Mehryar},
booktitle={Handbook of weighted automata},
pages={213--254},
year={2009},
publisher={Springer}
}
22 changes: 22 additions & 0 deletions docs/source/nlp/text_normalization/wfst/intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
WFST-based (Inverse) Text Normalization
=======================================

NeMo supports Text Normalization (TN) and Inverse Text Normalization (ITN) tasks via rule-based `nemo_text_processing` python package and Neural-based TN/ITN model.

`nemo_text_processing` that is installed with the `nemo_toolkit`, see :doc:`NeMo Introduction <../starthere/intro>` for installation details.
Additional requirements can be found in `setup.sh <https://github.com/NVIDIA/NeMo/blob/stable/nemo_text_processing/setup.sh>`_.

Tutorials on how to get started with WFST-based NeMo text normalization could be found `tutorials/text_processing <https://github.com/NVIDIA/NeMo/tree/stable/tutorials/text_processing>`_.

Rule-based (WFST) TN/ITN:

.. toctree::
:maxdepth: 2

wfst_text_normalization
wfst_inverse_text_normalization
wfst_text_processing_deployment
wfst_api



Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _wfst_api:

NeMo Text Processing API
========================

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,29 @@
.. _wfst_itn:

Inverse Text Normalization
==========================

Inverse text normalization (ITN) is a part of the Automatic Speech Recognition (ASR) post-processing pipeline.
ITN is the task of converting the raw spoken output of the ASR model into its written form to improve text readability.

For example,
`"in nineteen seventy"` -> `"in 1975"`
and `"it costs one hundred and twenty three dollars"` -> `"it costs $123"`.
Quick Start Guide
-----------------

.. code-block:: python
# import WFST-based ITN module
from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer
# initialize inverse normalizer
inverse_normalizer = InverseNormalizer(lang="en")
# try normalizer on a few examples
print(inverse_normalizer.normalize("it costs one hundred and twenty three dollars"))
# >>>"it costs $123"
print(inverse_normalizer.normalize("in nineteen seventy"))
# >>> "in 1970"
NeMo ITN :cite:`textprocessing-itn-zhang2021nemo` is based on WFST-grammars :cite:`textprocessing-itn-Mohri2009`. We also provide a deployment route to C++ using `Sparrowhawk <https://github.com/google/sparrowhawk>`_ :cite:`textprocessing-itn-sparrowhawk` -- an open-source version of Google Kestrel :cite:`textprocessing-itn-ebden2015kestrel`.
See :doc:`Text Procesing Deployment <../tools/text_processing_deployment>` for details.
Expand All @@ -17,11 +34,8 @@ See :doc:`Text Procesing Deployment <../tools/text_processing_deployment>` for d






Classes
----------------------------------
--------


The base class for every grammar is :class:`GraphFst<nemo_text_processing.text_normalization.en.GraphFst>`.
Expand Down Expand Up @@ -75,13 +89,25 @@ Example evaluation run on (cleaned) `Google's text normalization dataset <https:
python run_evaluation.py --input=./en_with_types/output-00001-of-00100 <--language LANGUAGE> [--cat CLASS_CATEGORY] [--filter]
Supported Languages
-------------------

ITN supports: English, Spanish, German, French, Vietnamese, and Russian languages.

Installation
------------

`nemo_text_processing` is installed with the `nemo_toolkit`.

See :doc:`NeMo Introduction <../starthere/intro>` for installation details.

Additional requirements can be found in `setup.sh <https://github.com/NVIDIA/NeMo/blob/stable/nemo_text_processing/setup.sh>`_.


References
----------

.. bibliography:: textprocessing_all.bib
.. bibliography:: ../tn_itn_all.bib
:style: plain
:labelprefix: TEXTPROCESSING-ITN
:keyprefix: textprocessing-itn-
Loading

0 comments on commit 063d349

Please sign in to comment.