Skip to content

Commit

Permalink
NeMo Forced Aligner (#5571)
Browse files Browse the repository at this point in the history
* Merge r1.13.0 main (#5570)

* update branch

Signed-off-by: ericharper <[email protected]>

* Rename Speech Dataset Processor to Speech Data Processor (#5378)

Signed-off-by: Elena Rastorgueva <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Megatron Export Update (#5343)

* export update for Megatron + change ORT optimization

Signed-off-by: David Mosallanezhad <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated export_utils to use autocast instead of manually casting >:/

Signed-off-by: David Mosallanezhad <[email protected]>

* removed dtype from LayerNorm

Signed-off-by: David Mosallanezhad <[email protected]>

* added comment

Signed-off-by: David Mosallanezhad <[email protected]>

* reverting changes on FloatCast

Signed-off-by: David Mosallanezhad <[email protected]>

* Cherry-picked changes from megatron-norm

Signed-off-by: Boris Fomitchev <[email protected]>

* updated asr_model import to cast_utils

Signed-off-by: David Mosallanezhad <[email protected]>

* updated del onnx_model place

Signed-off-by: David Mosallanezhad <[email protected]>

* changed ort optimization to basic -> temp fix

Signed-off-by: David Mosallanezhad <[email protected]>

Signed-off-by: David Mosallanezhad <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <[email protected]>

* Disable sync_batch_comm in validation_step for GPT (#5397)

* disable sync_batch_comm in validation_step

Signed-off-by: ericharper <[email protected]>

* Read sync_batch_comm from config or default to False

Signed-off-by: Markel Sanz Ausin <[email protected]>

* Update megatron_gpt_config to default sync_batch_comm to False to avoid CUDA error

Signed-off-by: Markel Sanz Ausin <[email protected]>

* Empty

Signed-off-by: MaximumEntropy <[email protected]>

* Comment out test

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Markel Sanz Ausin <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Markel Sanz Ausin <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Radtts 1.13 (#5451)

* [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (#5358)
* [TTS] add CI test for RADTTS training recipe.

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (#5339) (#5478)

* Initial refactor

Signed-off-by: MaximumEntropy <[email protected]>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes for eval

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <[email protected]>

* Refactor

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <[email protected]>

* Remove comments

Signed-off-by: MaximumEntropy <[email protected]>

* Minor

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <[email protected]>

* Remove old comment

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* export_utils bugfix (#5480)

* updated export_utils

Signed-off-by: David Mosallanezhad <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Mosallanezhad <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Export fixes for Riva (#5496)

* Export fixes for Riva

Signed-off-by: Boris Fomitchev <[email protected]>

* Cleaning up training_utils

Signed-off-by: Boris Fomitchev <[email protected]>

Signed-off-by: Boris Fomitchev <[email protected]>

* added set_start_method + function param bugfix (#5539)

* added set_start_method + function param bugfix

Signed-off-by: David Mosallanezhad <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* upper bound torchmetrics

Signed-off-by: ericharper <[email protected]>

Signed-off-by: David Mosallanezhad <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ericharper <[email protected]>

* remove notebook (#5548)

Signed-off-by: ericharper <[email protected]>

Signed-off-by: ericharper <[email protected]>

* update readme

Signed-off-by: ericharper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: David Mosallanezhad <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Markel Sanz Ausin <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Markel Sanz Ausin <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Optimized loop and bugfix in SDE (#5573)

- Fixed bug with loading custom data attributes from JSON in Speech Data Explorer

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Update torchmetrics  (#5566)

* add task arg

Signed-off-by: nithinraok <[email protected]>

* update state

Signed-off-by: nithinraok <[email protected]>

Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* remove useless files. (#5580)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* add initial NFA code

Signed-off-by: Elena Rastorgueva <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Elena Rastorgueva <[email protected]>

* Make use of the specified device during viterbi decoding

Signed-off-by: Elena Rastorgueva <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix CodeQL notes

Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix CodeQL warning

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add an option to defer data setup from ``__init__`` to ``setup`` (#5569)

* Add an option to defer dataloader setup from __init__ to setup

Signed-off-by: Ante Jukić <[email protected]>

* Updated doc

Signed-off-by: Ante Jukić <[email protected]>

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Make utt_id specified by number of parts of audio_filepath user wishes to use

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove audio_sr TODO - reduce risk of silent bugs

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add check that model is CTC

Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove unused import

Signed-off-by: Elena Rastorgueva <[email protected]>

* Text generation improvement (UI client, data parallel support) (#5437)

* Squashed commit of the following:

commit a5e124f34be31bd6eafe5e5fdf5bedcd0d50915c
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Thu Oct 13 15:07:42 2022 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit 35b424044fe80c3081e7756ab21244f701716f7e
Author: Yi Dong <[email protected]>
Date:   Thu Oct 13 08:04:49 2022 -0700

    get rid of base

    Signed-off-by: Yi Dong <[email protected]>

commit 2955210e2311791543538cfbb5ad26b79414c954
Merge: d52edef8c eaf6757ca
Author: Yi Dong <[email protected]>
Date:   Thu Oct 13 13:17:02 2022 +0000

    Merge branch 'universal_prompt' of github.com:NVIDIA/NeMo into universal_prompt

commit d52edef8cd7b36593838fb270047e80f8ccb652e
Author: Yi Dong <[email protected]>
Date:   Thu Oct 13 13:16:24 2022 +0000

    align with main

    Signed-off-by: Yi Dong <[email protected]>

commit eaf6757ca5be8e099492f57c81d984429b0ad49c
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Thu Oct 13 13:12:11 2022 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit c4b86d97626ea0721bf8fb4c0a45dec5becc94c9
Author: Yi Dong <[email protected]>
Date:   Thu Oct 13 13:10:58 2022 +0000

    same as main

    Signed-off-by: Yi Dong <[email protected]>

commit e335de51bcc0d681c58b568c3d8c238bc5687c3b
Merge: c231086e0 4463a9fe9
Author: Yi Dong <[email protected]>
Date:   Thu Oct 13 13:08:09 2022 +0000

    Merge branch 'main' into universal_prompt

commit c231086e057f1efaa915f691d84664cb3d5aad85
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Wed Oct 12 19:59:12 2022 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit 6a821a4b49a23dd3408a706a2a3dd393149b0bb1
Author: Yi Dong <[email protected]>
Date:   Wed Oct 12 19:56:17 2022 +0000

    default to pad

    Signed-off-by: Yi Dong <[email protected]>

commit 9d908e39fef1beed9ba2da4d1a6806161eb7ef25
Author: Yi Dong <[email protected]>
Date:   Wed Oct 12 19:55:44 2022 +0000

    add the option to pad the tokens

    Signed-off-by: Yi Dong <[email protected]>

commit 876dc395b43fdeeaa2bcbbe13c76523633764c33
Merge: fbb0f4035 fe3c77ee9
Author: Yi Dong <[email protected]>
Date:   Wed Oct 12 19:20:47 2022 +0000

    Merge branch 'fix_global_init' into universal_prompt

commit fe3c77ee93ab6cf3ea152db68cb6beefcac2a392
Author: Yi Dong <[email protected]>
Date:   Wed Oct 12 18:59:49 2022 +0000

    fix import again

    Signed-off-by: Yi Dong <[email protected]>

commit fbb0f4035c6cd6bfefed50a20605503de8c1dccb
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Wed Oct 12 16:00:24 2022 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit 372ca8c0d7988f2339b15888dc72aa21f4fb6937
Author: Yi Dong <[email protected]>
Date:   Wed Oct 12 15:58:32 2022 +0000

    enable server

    Signed-off-by: Yi Dong <[email protected]>

commit cbe05d9fbc978f812cfbb671f45f147f300713c4
Author: Yi Dong <[email protected]>
Date:   Wed Oct 12 13:07:28 2022 +0000

    fix comment error

    Signed-off-by: Yi Dong <[email protected]>

commit 1948048922e726ec6131e44b1a745389f18d4ef2
Merge: 232c2cce3 984f5c09a
Author: Yi Dong <[email protected]>
Date:   Wed Oct 12 13:05:30 2022 +0000

    Merge branch 'fix_global_init' into universal_prompt

commit 232c2cce34d7a8b902da406706f3dd9b39475091
Merge: 34c8a68df 658243fb6
Author: Yi Dong <[email protected]>
Date:   Wed Oct 12 12:50:00 2022 +0000

    Merge branch 'fix_global_init' into universal_prompt

commit 984f5c09a6dbf1d1fb5aa30ed9b0df188e66a50f
Merge: 658243fb6 3fda5de46
Author: Yi Dong <[email protected]>
Date:   Wed Oct 12 08:42:11 2022 -0400

    Merge branch 'main' into fix_global_init

commit 658243fb6580191b5d60edd30cde16dcc23cbb85
Author: Yi Dong <[email protected]>
Date:   Wed Oct 12 12:40:57 2022 +0000

    fix import error

    Signed-off-by: Yi Dong <[email protected]>

commit 8e0fe1cad05ec288ec122b3cd0e139a96872e08c
Author: Yi Dong <[email protected]>
Date:   Tue Oct 11 22:44:12 2022 +0000

    update the fused kernel

    Signed-off-by: Yi Dong <[email protected]>

commit 536cf6bef9447b75843fad630729c47a2fba35f3
Author: Yi Dong <[email protected]>
Date:   Tue Oct 11 14:44:52 2022 -0700

    add the missing file

    Signed-off-by: Yi Dong <[email protected]>

commit 1b437ec41dc5e354453ce0a089bca0171cbcb6c2
Author: Yi Dong <[email protected]>
Date:   Tue Oct 11 14:43:14 2022 -0700

    fix fused softmax

    Signed-off-by: Yi Dong <[email protected]>

commit 7813f60e05f9783af61f8c14ec1cb0c6c4f1f263
Author: Yi Dong <[email protected]>
Date:   Tue Oct 11 14:16:48 2022 -0700

    move global step to base

    Signed-off-by: Yi Dong <[email protected]>

commit 34c8a68df084b18d377e84415d9f07b2cd6673dd
Author: Yi Dong <[email protected]>
Date:   Thu Oct 6 13:50:11 2022 +0000

    fix pipeline for eval

    Signed-off-by: Yi Dong <[email protected]>

commit eee5d38218f26660c3ffebe9f615c850c80a1f0d
Author: Yi Dong <[email protected]>
Date:   Thu Oct 6 13:48:22 2022 +0000

    fix for pipleline parallel

    Signed-off-by: Yi Dong <[email protected]>

commit 323bca73e7ef6099ee79c0a2fffac7b709ed6c5d
Merge: 125e49947 e3b4c4d1f
Author: Yi Dong <[email protected]>
Date:   Wed Oct 5 19:29:13 2022 +0000

    Merge branch 'universal_prompt' of github.com:NVIDIA/NeMo into universal_prompt

commit 125e4994760448ff75dd9328395813eda1c87547
Author: Yi Dong <[email protected]>
Date:   Wed Oct 5 19:29:04 2022 +0000

    add share option

    Signed-off-by: Yi Dong <[email protected]>

commit e3b4c4d1f7346c9fa596f3cca6d4df0a9e05c368
Author: Yi Dong <[email protected]>
Date:   Wed Oct 5 11:43:48 2022 -0700

    make sure consolidation works

    Signed-off-by: Yi Dong <[email protected]>

commit a5c833964ecf05dc460ca1da69275c4019742150
Merge: 2a07ab52d abcb74be2
Author: Yi Dong <[email protected]>
Date:   Wed Oct 5 18:40:29 2022 +0000

    Merge branch 'universal_prompt' of github.com:NVIDIA/NeMo into universal_prompt

commit 2a07ab52d95f15ba666823028c69e23825666c05
Author: Yi Dong <[email protected]>
Date:   Wed Oct 5 18:40:23 2022 +0000

    added requirement

    Signed-off-by: Yi Dong <[email protected]>

commit 3abecd9dd1611993a87c537636abe7f7e6a9b04c
Author: Yi Dong <[email protected]>
Date:   Wed Oct 5 18:39:42 2022 +0000

    added a simple web server

    Signed-off-by: Yi Dong <[email protected]>

commit abcb74be2caf1cdec40eb9ba2be4dde4d45a3b4b
Author: Yi Dong <[email protected]>
Date:   Wed Oct 5 06:54:12 2022 -0700

    fix empty val loss

    Signed-off-by: Yi Dong <[email protected]>

commit b8eb92ac4a0d665570af75e34c9ba3c2e2420c26
Author: Yi Dong <[email protected]>
Date:   Tue Oct 4 19:25:30 2022 -0700

    text gen working

    Signed-off-by: Yi Dong <[email protected]>

commit d59f3e3f3a6fd19736d1c5706fed65a3dd4049ba
Author: Yi Dong <[email protected]>
Date:   Tue Oct 4 16:08:40 2022 -0700

    first change

    Signed-off-by: Yi Dong <[email protected]>

commit 59d077585e6962a669b824af58f64e8a0bea6547
Author: Yi Dong <[email protected]>
Date:   Tue Oct 4 15:00:40 2022 -0700

    revert

    Signed-off-by: Yi Dong <[email protected]>

commit 12a0f3902d99e9179403644bd951c045df716ca7
Author: Yi Dong <[email protected]>
Date:   Tue Oct 4 21:26:23 2022 +0000

    init imp

    Signed-off-by: Yi Dong <[email protected]>

commit 62a15dfd943cc48be495ac61b9f2f00995775c5f
Merge: 82c90d2cd e0cc6b767
Author: Yi Dong <[email protected]>
Date:   Tue Oct 4 11:58:26 2022 -0700

    Merge branch 'main' into universal_prompt

commit 82c90d2cd0fd156f16a4b899f8c741d598f33990
Author: Yi Dong <[email protected]>
Date:   Tue Oct 4 11:17:13 2022 -0700

    add sync

    Signed-off-by: Yi Dong <[email protected]>

commit 9819b703eef877d90cd1257bf3610c69de9b4d7e
Author: Yi Dong <[email protected]>
Date:   Sun Oct 2 17:52:34 2022 -0700

    fix save model

    Signed-off-by: root <[email protected]>

commit e4937e2fc5fb7d70754c97668416e4a69c3079fe
Author: Yi Dong <[email protected]>
Date:   Sat Oct 1 18:56:09 2022 +0000

    working

    Signed-off-by: Yi Dong <[email protected]>

commit b73b06d1c7cf5417a6d87cb33d8ed83a57e38b7b
Author: Yi Dong <[email protected]>
Date:   Sat Oct 1 17:34:03 2022 +0000

    calcuate the mask

    Signed-off-by: Yi Dong <[email protected]>

commit 9db3bc13eb65a94a475b837603351da68e3745bc
Author: Yi Dong <[email protected]>
Date:   Fri Sep 30 23:26:32 2022 +0000

    fix bug in datasets

    Signed-off-by: Yi Dong <[email protected]>

commit f289900375d4412f53f8110be00fec6587627550
Author: Yi Dong <[email protected]>
Date:   Fri Sep 30 22:29:40 2022 +0000

    update the code

    Signed-off-by: Yi Dong <[email protected]>

commit 8e28a1f208aabaab72dbe769e72756baada04d99
Author: Yi Dong <[email protected]>
Date:   Fri Sep 30 21:52:52 2022 +0000

    added new ds

    Signed-off-by: Yi Dong <[email protected]>

commit 8d41315bab7ce90e200a8a7d1023c34f8e046897
Author: Yi Dong <[email protected]>
Date:   Fri Sep 30 18:57:09 2022 +0000

    added new files

    Signed-off-by: Yi Dong <[email protected]>

commit 984e0e94e15e16323c1ba1ca2efeabd84f69463f
Merge: cbe8b7ab1 fa6cd8588
Author: Yi Dong <[email protected]>
Date:   Thu Sep 29 21:43:29 2022 +0000

    Merge branch 'llm-prompt-learning-improvements' into universal_prompt

commit fa6cd858839277939446afe7275976078d54c512
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Thu Sep 29 16:47:30 2022 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit 78ba46e5d6fde1be53c08e1e30a54cce59824be0
Merge: 7d6d46742 8d670bc77
Author: Virginia Adams <[email protected]>
Date:   Thu Sep 29 09:43:27 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit 7d6d46742170a66758287a207d67e1b1bfd15613
Author: Virginia Adams <[email protected]>
Date:   Thu Sep 29 16:42:43 2022 +0000

    Removed inference step and added sentence peice check to predict step

    Signed-off-by: Virginia Adams <[email protected]>

commit 20fd265acd6f7f9912cf52155fe66ccfa6b201a2
Author: Virginia Adams <[email protected]>
Date:   Thu Sep 29 15:26:32 2022 +0000

    fixed first stage check for pipeline parallel T5 pt

    Signed-off-by: Virginia Adams <[email protected]>

commit 3637be2b258c8d9028856f9971edb7da4a8121f0
Merge: a3ea722fd 986a76612
Author: Virginia Adams <[email protected]>
Date:   Wed Sep 28 10:23:30 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit a3ea722fdc12fbcc5989b76ef5643a574b763bc4
Merge: 770967a52 971485ce7
Author: Virginia Adams <[email protected]>
Date:   Mon Sep 26 13:35:52 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit 770967a5251a474b6dcc2d44bf9a2076adbcb604
Merge: d23bf6c30 e3ac280a8
Author: Virginia Adams <[email protected]>
Date:   Mon Sep 26 10:17:03 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit d23bf6c30acc0e3f6af9b4e24547669866a34d62
Merge: de6a31651 333d2b749
Author: Virginia Adams <[email protected]>
Date:   Mon Sep 26 10:05:16 2022 -0700

    Merge branch 'llm-prompt-learning-improvements' of https://github.com/NVIDIA/NeMo into llm-prompt-learning-improvements

commit de6a31651e63d88a42b971794d93f18ff5a3cdff
Author: Virginia Adams <[email protected]>
Date:   Mon Sep 26 17:00:53 2022 +0000

    Updated PP check to be on first stage pipeline only

    Signed-off-by: Virginia Adams <[email protected]>

commit 333d2b7498e6742ce66436f733c980a74616900c
Merge: 592c0986a a39fc925a
Author: Virginia Adams <[email protected]>
Date:   Fri Sep 23 16:11:21 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit 592c0986a476a91b57b8605d7b70830d7acfa021
Author: Virginia Adams <[email protected]>
Date:   Fri Sep 23 23:08:41 2022 +0000

    Fixed unused import and CI test bug

    Signed-off-by: Virginia Adams <[email protected]>

commit ea9cd82d85638bc60ae4ad7ef105db931c8e3455
Merge: ce4b72c8c b566c2d0e
Author: Virginia Adams <[email protected]>
Date:   Fri Sep 23 18:57:25 2022 +0000

    Merge branch 'llm-prompt-learning-improvements' of https://github.com/NVIDIA/NeMo into llm-prompt-learning-improvements

commit ce4b72c8c52f32be336e323dd78a38089edc3e7c
Author: Virginia Adams <[email protected]>
Date:   Fri Sep 23 18:57:16 2022 +0000

    Switch to import from base class

    Signed-off-by: Virginia Adams <[email protected]>

commit b566c2d0e35a068f758fd1310bc620a47be4590b
Merge: 6621f2854 e872061ac
Author: Virginia Adams <[email protected]>
Date:   Fri Sep 23 10:09:03 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit 6621f28543828a48484a5637f6c9f3ccb23a5b02
Author: Virginia Adams <[email protected]>
Date:   Wed Sep 14 20:47:35 2022 +0000

    python format fix

    Signed-off-by: Virginia Adams <[email protected]>

commit 8deafc8987b6af5f7b99a250310f57a40198c37f
Author: Virginia Adams <[email protected]>
Date:   Wed Sep 14 20:28:02 2022 +0000

    Save .nemo on new best val score

    Signed-off-by: Virginia Adams <[email protected]>

commit 761bd36969cb465d6a129e9eee6ce1f883d3cf41
Author: Virginia Adams <[email protected]>
Date:   Wed Sep 14 18:03:19 2022 +0000

    Added automatic checkpoint to nemo file method

    Signed-off-by: Virginia Adams <[email protected]>

commit 3be4ed57b6cd3ddfe4876d78650dfe8fe794598b
Author: Virginia Adams <[email protected]>
Date:   Wed Sep 14 02:11:56 2022 +0000

    Make GPT use base prompt learning model class:

    Signed-off-by: Virginia Adams <[email protected]>

Signed-off-by: Yi Dong <[email protected]>

* fix LGTM

Signed-off-by: Yi Dong <[email protected]>

* fix validation

Signed-off-by: Yi Dong <[email protected]>

* change for the lm eval

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make text generation work in data parallel environment

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implement the service with rest service

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* surpress log

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Update config

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore function needed for NMT

Signed-off-by: MaximumEntropy <[email protected]>

* handles no answer only

Signed-off-by: Yi Dong <[email protected]>

* Fix config

Signed-off-by: MaximumEntropy <[email protected]>

* added knn to web

Signed-off-by: Yi Dong <[email protected]>

* fix lgtm.com comments

Signed-off-by: Yi Dong <[email protected]>

* output the retrieved context

Signed-off-by: Yi Dong <[email protected]>

* allow no neighbor query

Signed-off-by: Yi Dong <[email protected]>

* remove the imports

Signed-off-by: Yi Dong <[email protected]>

* warn only once

Signed-off-by: Yi Dong <[email protected]>

* Change output file format from JSON to JSONL

Signed-off-by: MaximumEntropy <[email protected]>

* new t0 dataset

Signed-off-by: Yi Dong <[email protected]>

* Add T0 data preproc scripts

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Merge and multiprocessing

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix for is_correct

Signed-off-by: MaximumEntropy <[email protected]>

* fix epoch > 2

Signed-off-by: Yi Dong <[email protected]>

* handles multiple dataloader

Signed-off-by: Yi Dong <[email protected]>

* remove template

Signed-off-by: Yi Dong <[email protected]>

* Refactor T0 dataset

Signed-off-by: MaximumEntropy <[email protected]>

* Add script to merge train folder into individual training files to minimize number of blends

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added on the fly service

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add combo instance

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added combo service

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* send weights back to server

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix index store

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor changes

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add reset button

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add add eos

Signed-off-by: Yi Dong <[email protected]>

* use a seperate bert service

Signed-off-by: Yi Dong <[email protected]>

* no loss of accuracy

Signed-off-by: Yi Dong <[email protected]>

* pin the gradio version

Signed-off-by: Yi Dong <[email protected]>

* Remove bin compat

Signed-off-by: MaximumEntropy <[email protected]>

* Fix header lines

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* evaluate based on text generation

Signed-off-by: Yi Dong <[email protected]>

* exact match result aggregation

Signed-off-by: Yi Dong <[email protected]>

* working SP and SA

Signed-off-by: Yi Dong <[email protected]>

* sync

Signed-off-by: Yi Dong <[email protected]>

* fix checkpoint

Signed-off-by: Yi Dong <[email protected]>

* fix eval

Signed-off-by: Yi Dong <[email protected]>

* backup states

Signed-off-by: Yi Dong <[email protected]>

* backup states reset

Signed-off-by: Yi Dong <[email protected]>

* fix the bug

Signed-off-by: Yi Dong <[email protected]>

* fix evaluation for sentence piece

Signed-off-by: Yi Dong <[email protected]>

* fix a bug

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* potential fix in the future

Signed-off-by: Yi Dong <[email protected]>

* remove the universal codes

Signed-off-by: Yi Dong <[email protected]>

* remove universal strategy

Signed-off-by: Yi Dong <[email protected]>

* address reviewer comment

Signed-off-by: Yi Dong <[email protected]>

Signed-off-by: Yi Dong <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: MaximumEntropy <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add align function docstrings and make most args optional

Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove redundant returns of viterbi and log probs matrices

Signed-off-by: Elena Rastorgueva <[email protected]>

* Rename h# to <initial_silence>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Update manifest format description in README

Signed-off-by: Elena Rastorgueva <[email protected]>

* always remove any spaces from utt_id

Signed-off-by: Elena Rastorgueva <[email protected]>

* Patch the hanging of threads on very large stderr (#5589) (#5590)

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* O2 style amp for gpt3 ptuning (#5246)

* enable amp o2 plugin

Signed-off-by: Jimmy Zhang <[email protected]>

* only create master param if param requires gradient

Signed-off-by: Jimmy Zhang <[email protected]>

* remove pytorch autocast

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jimmy Zhang <[email protected]>

* Update optimizer_with_main_params.py

Signed-off-by: JimmyZhang12 <[email protected]>

* create master grad only if param group requires grad

Signed-off-by: Jimmy Zhang <[email protected]>

* fix grad scaler for pp > 1

Signed-off-by: Jimmy Zhang <[email protected]>

Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Better patch hydra (#5591) (#5592)

* Readd buffereing and thread drain to Hydra Launcher

Signed-off-by: smajumdar <[email protected]>

* Readd buffereing and thread drain to Hydra Launcher

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Yet another fix with hydra multirun (#5594) (#5595)

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add RETRO model documentation (#5578)

* added retro doc

Signed-off-by: Yi Dong <[email protected]>

* finish data part

Signed-off-by: Yi Dong <[email protected]>

* added the data format

Signed-off-by: Yi Dong <[email protected]>

* added training script

Signed-off-by: Yi Dong <[email protected]>

* added training and evaluation steps

Signed-off-by: Yi Dong <[email protected]>

* edit the text

Signed-off-by: Yi Dong <[email protected]>

* added the images

Signed-off-by: Yi Dong <[email protected]>

* fix beginning

Signed-off-by: Yi Dong <[email protected]>

* fix the grammar

Signed-off-by: Yi Dong <[email protected]>

* trim it down

Signed-off-by: Yi Dong <[email protected]>

* add wandb option

Signed-off-by: Yi Dong <[email protected]>

* add reference

Signed-off-by: Yi Dong <[email protected]>

* fix path

Signed-off-by: Yi Dong <[email protected]>

* added the parameters table

Signed-off-by: Yi Dong <[email protected]>

* fix section

Signed-off-by: Yi Dong <[email protected]>

Signed-off-by: Yi Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix: setup_multiple validation/test data (#5585)

Fix: setup_multiple validation/test data (#5585)

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Move to optimizer based EMA implementation (#5169)

* Move to optimizer

Signed-off-by: SeanNaren <[email protected]>

* Fix replacing weights

Signed-off-by: SeanNaren <[email protected]>

* Allow swapping of weights be optional

Signed-off-by: SeanNaren <[email protected]>

* Save 2 models

Signed-off-by: SeanNaren <[email protected]>

* Use different hook

Signed-off-by: SeanNaren <[email protected]>

* Expose cpu device

Signed-off-by: SeanNaren <[email protected]>

* Add clause to see if this fixes issue with O2 optimizer

Signed-off-by: SeanNaren <[email protected]>

* Try to get O2 working

Signed-off-by: SeanNaren <[email protected]>

* WIP

Signed-off-by: SeanNaren <[email protected]>

* Fixes

Signed-off-by: SeanNaren <[email protected]>

* Fixes to tests

Signed-off-by: SeanNaren <[email protected]>

* Add guard

Signed-off-by: SeanNaren <[email protected]>

* Remove import

Signed-off-by: SeanNaren <[email protected]>

* Add guard

Signed-off-by: SeanNaren <[email protected]>

* Add comment

Signed-off-by: SeanNaren <[email protected]>

* Remove overwrite

Signed-off-by: SeanNaren <[email protected]>

* Add BatchNorm, currently tests fail

Signed-off-by: SeanNaren <[email protected]>

* Fix tests/functionality for batch norm

Signed-off-by: SeanNaren <[email protected]>

* Get rid of NLP changes

Signed-off-by: SeanNaren <[email protected]>

Signed-off-by: SeanNaren <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* AIStore for ASR datasets (#5462)

AIStore for ASR datasets

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add support for MHA adapters to ASR (#5396)

* Convert AbstractAdapterModule to AbstractAdapterMixin

Signed-off-by: smajumdar <[email protected]>

* Temporary fixes to new signature of mixin

Signed-off-by: smajumdar <[email protected]>

* Add adapter util for constants, add all mha adapters.

Signed-off-by: smajumdar <[email protected]>

* Update name of function

Signed-off-by: smajumdar <[email protected]>

* Roll back changes to convASR

Signed-off-by: smajumdar <[email protected]>

* Convert AbstractAdapterModule to AbstractAdapterMixin

Signed-off-by: smajumdar <[email protected]>

* First draft of Conformer support for MHA attention

Signed-off-by: smajumdar <[email protected]>

* Add some preliminary tests

Signed-off-by: smajumdar <[email protected]>

* Add support for projection of the hidden dimension for attention

Signed-off-by: smajumdar <[email protected]>

* Add support for squeezeformer

Signed-off-by: smajumdar <[email protected]>

* Update train adapter config

Signed-off-by: smajumdar <[email protected]>

* Add tests for squeezeformer and unit tests for new modules

Signed-off-by: smajumdar <[email protected]>

* Update config for hp search,set limits on modules for conformer and squeezeformer, update adapter mixin, add cache to import_from_class_path

Signed-off-by: smajumdar <[email protected]>

* Update location of adapters

Signed-off-by: smajumdar <[email protected]>

* Add pre_norm for proper attention learning, Fix the issue with nan/inf in pos_bias_u and pos_bias_v

Signed-off-by: smajumdar <[email protected]>

* Update expmanager to clean up checkpoints

Signed-off-by: smajumdar <[email protected]>

* Fix style

Signed-off-by: smajumdar <[email protected]>

* Add docstrings and update tests

Signed-off-by: smajumdar <[email protected]>

* Add docstrings and update tests

Signed-off-by: smajumdar <[email protected]>

* Add docstrings and update tests

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update training scripts

Signed-off-by: smajumdar <[email protected]>

* Update config and docs

Signed-off-by: smajumdar <[email protected]>

* Expose nemo delete function

Signed-off-by: smajumdar <[email protected]>

* Correct adapter partial state saving

Signed-off-by: smajumdar <[email protected]>

* Correct a bug with state management of adapter tokens

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Pull down EMA test

Signed-off-by: smajumdar <[email protected]>

* Correct name of adapter module utility class

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove unused TTS eval functions w/ pesq and pystoi dependencies (#5605) (#5606)

Signed-off-by: Jocelyn Huang <[email protected]>

Signed-off-by: Jocelyn Huang <[email protected]>

Signed-off-by: Jocelyn Huang <[email protected]>
Co-authored-by: Jocelyn <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Create separator parameter

Signed-off-by: Elena Rastorgueva <[email protected]>

* Call align function with hydra config

Signed-off-by: Elena Rastorgueva <[email protected]>

* update usage example

Signed-off-by: Elena Rastorgueva <[email protected]>

* Update Dockerfile (#5614) (#5616)

Pinned to use `numba==0.53.1` to avoid crashing in training with `num_workers > 0`. This is just a temporary workaround, still need to fix it in the future.

Signed-off-by: He Huang (Steve) <[email protected]>

Signed-off-by: He Huang (Steve) <[email protected]>

Signed-off-by: He Huang (Steve) <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Make separate pretrained_name and model_path parameters

Signed-off-by: Elena Rastorgueva <[email protected]>

* make "optional" tags bold in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Move non-main functions to utils dir

Signed-off-by: Elena Rastorgueva <[email protected]>

* Temp workaround: Disable test with cache_audio=True since it is failing in CI (#5607) (#5615)

Signed-off-by: Ante Jukić <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS] fix ranges of char set for accented letters. (#5607)

* [TTS] fix ranges of char set for accented letters.
* remove digits pattern and added unit tests for math operators.

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Change success message to reduce confusion (#5621)

Signed-off-by: SeanNaren <[email protected]>

Signed-off-by: SeanNaren <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Update documentation and tutorials for Adapters  (#5610)

* Improve docs for adapter and tests

Signed-off-by: smajumdar <[email protected]>

* Improve docs for adapter and tests

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Rename test file

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS] add type hints and change varialbe names for tokenizers and g2p (#5602)

* [TTS] add type hints and change variable names for tokenizers and g2p

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* 1. Added missing import for gather_objects. (#5627)

Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Micha Livne <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. (#5596) (#5625)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fixed RadTTS unit test (#5572)

Signed-off-by: Boris Fomitchev <[email protected]>

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* remove tests (#5633)

Signed-off-by: ericharper <[email protected]>

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS][DOC] add notes about automatic conversion to target sampling rates. (#5624) (#5634)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Conformer local attention (#5525)

* local attn and merge

Signed-off-by: sam1373 <[email protected]>

* optional

Signed-off-by: sam1373 <[email protected]>

* override

Signed-off-by: sam1373 <[email protected]>

* incorporate comments

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* comment

Signed-off-by: sam1373 <[email protected]>

* changes, test

Signed-off-by: sam1373 <[email protected]>

* changes

Signed-off-by: sam1373 <[email protected]>

* check att context

Signed-off-by: sam1373 <[email protected]>

* readme link

Signed-off-by: sam1373 <[email protected]>

* utils

Signed-off-by: sam1373 <[email protected]>

* update

Signed-off-by: sam1373 <[email protected]>

Signed-off-by: sam1373 <[email protected]>
Signed-off-by: Samuel Kriman <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add core classes and functions for online clustering diarizer part 1 (#5526)

* Add core classes and functions for online clustering diarizer

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add audio to labels code

Signed-off-by: Taejin Park <[email protected]>

* resolve type errors

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added unit=tests for very short audio

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Filled all missing docstrings

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolved conflict and added missing docstrings

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed unit-test errors

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the wrongly added file - megatron_gpt_model.py

Signed-off-by: Taejin Park <[email protected]>

* Fix wrongly included file - megatron_gpt_model.py

Signed-off-by: Taejin Park <[email protected]>

* resolve code quality issue

Signed-off-by: Taejin Park <[email protected]>

* Fixed unit-test errors and bugs

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changed total_sec for offline_clustering toy_data in unit-tests

Signed-off-by: Taejin Park <[email protected]>

* fixed merging index offset bug

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* only including part 1 files

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused function

Signed-off-by: Taejin Park <[email protected]>

* fixed unused imports

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* divided nmesc_clustering.py into two and reflected first-pass comments

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding offline/online_clustering.py

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code QL autocomment

Signed-off-by: Taejin Park <[email protected]>

* Removed unused imports

Signed-off-by: Taejin Park <[email protected]>

* Update nemo/collections/asr/parts/utils/online_clustering.py

Co-authored-by: Sean Naren <[email protected]>
Signed-off-by: Taejin Park <[email protected]>

* Reflected comments

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolved code scanning issue

Signed-off-by: Taejin Park <[email protected]>

* Update nemo/collections/asr/parts/utils/offline_clustering.py

Co-authored-by: Sean Naren <[email protected]>
Signed-off-by: Taejin Park <[email protected]>

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Sean Naren <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models (#5639) (#5641)

* add stt_eo_conformer_ctc_large model

* stt_eo_conformer_transducer_large

Co-authored-by: Andrei Andrusenko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Removed unused import

Signed-off-by: Elena Rastorgueva <[email protected]>

* Specify that filepaths need to be absolute

Signed-off-by: Elena Rastorgueva <[email protected]>

* replaces any spaces in utt_id with dashes

Signed-off-by: Elena Rastorgueva <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Elena Rastorgueva <[email protected]>

* Make hydra script callable by another script

Signed-off-by: Elena Rastorgueva <[email protected]>

* do not specify default model or model_downsample_factor

Signed-off-by: Elena Rastorgueva <[email protected]>

* [Dockerfile] Remove AIS archive from docker image (#5629)

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Measure audio_sr from audio instead of needing to specify

Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS][ZH] Disambiguate polyphones with augmented dict and Jieba segmenter for Chinese FastPitch (#5541)

* Chinese TTS replaces default pypinyin dict
* Add jieba word segmenter as an option

Signed-off-by: Yuekai Zhang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Make separate parameters for device of transcription and viterbi steps

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add mention of gecko

Signed-off-by: Elena Rastorgueva <[email protected]>

* [workflow] add exclude labels option to ignore cherry-picks in release changelog. (#5645)

Signed-off-by: Xuesong Yang <[email protected]>

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. (#5643) (#5647)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [Add] ASR+VAD Inference Pipeline (#5575)

Added offline ASR+VAD inference pipeline that matches with what's in RIVA, along with some feature-based ASR and classification datasets.

Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: fayejf <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* rename separator to ctm_grouping_separator and refactor

Signed-off-by: Elena Rastorgueva <[email protected]>

* Bert interleaved (#5556)

* Adding SP and SAR support Bert

* Adding Sequence parallel support to Bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding Sequence parallel support to Bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding SP and SAR support Bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding SP and SAR support Bert

* Adding SP and SAR support Bert

* Adding Sequence parallel support to Bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding Sequence parallel support to Bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding Sequence parallel support to Bert

* Update bert_model.py

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Adding tests

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Addressing Eric's comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Addressing Eric's comments

* Fix bug fix sequence parallel and Interleaved

* Fix bug fix sequence parallel and Interleaved

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add duration padding support for RADTTS inference (#5650)

* Added duration padding support for RADTTS inference

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Kevin Shih <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add remove_blank_tokens_from_ctm parameter

Signed-off-by: Elena Rastorgueva <[email protected]>

* Dont save initial_silence line in CTM

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add DLLogger support to exp_manager (#5658)

* Add DLLogger support to exp_manager

Signed-off-by: Alexandre Milesi <[email protected]>

* Move dllogger to separate file and check import

Signed-off-by: Alexandre Milesi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Alexandre Milesi <[email protected]>

Signed-off-by: Alexandre Milesi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* add minimum_timestamp_duration parameter

Signed-off-by: Elena Rastorgueva <[email protected]>

* add suggestion about removing blanks to README

Signed-off-by: Elena Rastorgueva <[email protected]>

* reorder args

Signed-off-by: Elena Rastorgueva <[email protected]>

* clarify description of ctm_grouping_separator in README

Signed-off-by: Elena Rastorgueva <[email protected]>

* update docstring

Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS][ZH] bugfix for ngc cli installation. (#5652) (#5664)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Port stateless timer to exp manager (#5584)

* Port stateless timer to exp manager

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes and remove from all megatron code

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change message

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix EMA restart by allowing device to be set by the class init (#5668)

Signed-off-by: SeanNaren <[email protected]>

Signed-off-by: SeanNaren <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Remove SDP (moved to separate repo) - merge to main (#5630)

* Remove sdp files from tools folder

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add page to docs with new SDP location

Signed-off-by: Elena Rastorgueva <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add interface for making amax reduction optional for FP8 (#5447)

* add TE interface for making amax reduction optional

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS] add tts dict cust notebook (#5662)

* add tts dict cust notebook

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* fixed audio links

Signed-off-by: ekmb <[email protected]>

* remove old notebook

Signed-off-by: ekmb <[email protected]>

* fix typo

Signed-off-by: ekmb <[email protected]>

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [ASR] Audio processing base, multi-channel enhancement models (#5356)

* Audio processing base model, enc-mask-dec enhancement, tests and modules

Signed-off-by: Ante Jukić <[email protected]>

* Addressed review comments

Signed-off-by: Ante Jukić <[email protected]>

* Fixed CodeQL warnings

Signed-off-by: Ante Jukić <[email protected]>

* Addressed PR comments

Signed-off-by: Ante Jukić <[email protected]>

* Addressed PR comments:
- renamed AudioProcessingModel to AudioToAudioModel
- various small modifications
- updated unit tests

Signed-off-by: Ante Jukić <[email protected]>

* Addressed comments
- Moved spectrogram to audio_preprocessing
- Renamed MultichannelFeatures
- Updated config and unit tests

Signed-off-by: Ante Jukić <[email protected]>

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Expose ClusteringDiarizer device (#5681)

* Expose device for users to set

Signed-off-by: SeanNaren <[email protected]>

* Expose device for users to set

Signed-off-by: SeanNaren <[email protected]>

Signed-off-by: SeanNaren <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add Beam Search support to ASR transcribe() (#5443)

* Add support for beam decoding via high level API.

Signed-off-by: smajumdar <[email protected]>

* Add ctc decoding section

Signed-off-by: smajumdar <[email protected]>

* Update ctc transcribe API to return results from beam search

Signed-off-by: smajumdar <[email protected]>

* Add argument to preserve arpa file

Signed-off-by: smajumdar <[email protected]>

* Update script to use hydra config, add some support for future compute timesteps, add doc for ctc decoding

Signed-off-by: smajumdar <[email protected]>

* Update eval script and doc to use new API

Signed-off-by: smajumdar <[email protected]>

* Add tests for ctc greedy decoding

Signed-off-by: smajumdar <[email protected]>

* Address reviewer comments and add docstrings

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix changes and address comments

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Propagate attention_dropout flag for GPT-3 (#5669)

* Propagate attention_dropout flag for GPT-3

Signed-off-by: Mikołaj Błaż <[email protected]>

* Add default to megatron_gpt_config

Signed-off-by: Mikołaj Błaż <[email protected]>

Signed-off-by: Mikołaj Błaż <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Eric Harper <complex451@gmail…
  • Loading branch information
Show file tree
Hide file tree
Showing 9 changed files with 1,402 additions and 0 deletions.
84 changes: 84 additions & 0 deletions tools/nemo_forced_aligner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# NeMo Forced Aligner (NFA)

A tool for doing Forced Alignment using Viterbi decoding of NeMo CTC-based models.

## Usage example

``` bash
python <path_to_NeMo>/tools/nemo_forced_aligner/align.py \
pretrained_name="stt_en_citrinet_1024_gamma_0_25" \
model_downsample_factor=8 \
manifest_filepath=<path to manifest of utterances you want to align> \
output_dir=<path to where your ctm files will be saved>
```

## How do I use NeMo Forced Aligner?
To use NFA, all you need to provide is a correct NeMo manifest (with `"audio_filepath"` and `"text"` fields).

Call the `align.py` script, specifying the parameters as follows:

* `pretrained_name`: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded from NGC and used for generating the log-probs which we will use to do alignment. Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). If `model_path` is specified, `pretrained_name` must not be specified.
>Note: NFA can only use CTC models (not Transducer models) at the moment. If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors.
* `model_path`: string specifying the local filepath to a CTC NeMo ASR model which will be used to generate the log-probs which we will use to do alignment. If `pretrained_name` is specified, `model_path` must not be specified.
>Note: NFA can only use CTC models (not Transducer models) at the moment. If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors.
* `model_downsample_factor`: the downsample factor of the ASR model. It should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet.

* `manifest_filepath`: The path to the manifest of the data you want to align, containing `'audio_filepath'` and `'text'` fields. The audio filepaths need to be absolute paths.

* `output_dir`: The folder where to save CTM files containing the generated alignments and new JSON manifest containing paths to those CTM files. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `<output_dir>/{tokens,words,additional_segments}/<utt_id>.ctm` and each line in each file will start with `<utt_id>`. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `audio_filepath_parts_in_utt_id`. The new JSON manifest will be at `<output_dir>/<original manifest file name>_with_ctm_paths.json`.

* **[OPTIONAL]** `align_using_pred_text`: if True, will transcribe the audio using the ASR model (specified by `pretrained_name` or `model_path`) and then use that transcription as the 'ground truth' for the forced alignment. The `"pred_text"` will be saved in the output JSON manifest at `<output_dir>/{original manifest name}_with_ctm_paths.json`. To avoid over-writing other transcribed texts, if there are already `"pred_text"` entries in the original manifest, the program will exit without attempting to generate alignments. (Default: False).

* **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). If None, NFA will set it to 'cuda' if it is available (otherwise will set it to 'cpu'). If specified `transcribe_device` needs to be a string that can be input to the `torch.device()` method. (Default: `None`).

* **[OPTIONAL]** `viterbi_device`: The device that will be used for doing Viterbi decoding. If None, NFA will set it to 'cuda' if it is available (otherwise will set it to 'cpu'). If specified `transcribe_device` needs to be a string that can be input to the `torch.device()` method.(Default: `None`).

* **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1).

* **[OPTIONAL]** `additional_ctm_grouping_separator`: the string used to separate CTM segments if you want to obtain CTM files at a level that is not the token level or the word level. NFA will always produce token-level and word-level CTM files in: `<output_dir>/tokens/<utt_id>.ctm` and `<output_dir>/words/<utt_id>.ctm`. If `additional_ctm_grouping_separator` is specified, an additional folder `<output_dir>/{tokens/words/additional_segments}/<utt_id>.ctm` will be created containing CTMs for `addtional_ctm_grouping_separator`-separated segments. (Default: `None`. Cannot be empty string or space (" "), as space-separated word-level CTMs will always be saved in `<output_dir>/words/<utt_id>.ctm`.)
> Note: the `additional_ctm_grouping_separator` will be removed from the ground truth text and all the output CTMs, ie it is treated as a marker which is not part of the ground truth. The separator will essentially be treated as a space, and any additional spaces around it will be amalgamated into one, i.e. if `additional_ctm_grouping_separator="|"`, the following texts will be treated equivalently: `“abc|def”`, `“abc |def”`, `“abc| def”`, `“abc | def"`.
* **[OPTIONAL]** `remove_blank_tokens_from_ctm`: a boolean denoting whether to remove <blank> tokens from token-level output CTMs. (Default: False).

* **[OPTIONAL]** `audio_filepath_parts_in_utt_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files.

* **[OPTIONAL]** `minimum_timestamp_duration`: a float indicating a minimum duration (in seconds) for timestamps in the CTM. If any line in the CTM has a duration lower than the `minimum_timestamp_duration`, it will be enlarged from the middle outwards until it meets the minimum_timestamp_duration, or reaches the beginning or end of the audio file. Note that this may cause timestamps to overlap. (Default: 0, i.e. no modifications to predicted duration).

# Input manifest file format
By default, NFA needs to be provided with a 'manifest' file where each line specifies the absolute "audio_filepath" and "text" of each utterance that you wish to produce alignments for, like the format below:
```json
{"audio_filepath": "/absolute/path/to/audio.wav", "text": "the transcription of the utterance"}
```

You can omit the `"text"` field from the manifest if you specify `align_using_pred_text=true`. In that case, any `"text"` fields in the manifest will be ignored: the ASR model at `pretrained_name` or `model_path` will be used to transcribe the audio and obtain `"pred_text"`, which will be used as the 'ground truth' for the forced alignment process. The `"pred_text"` will also be saved in the output manifest JSON file at `<output_dir>/<original manifest file name>_with_ctm_paths.json`. To remove the possibility of overwriting `"pred_text"`, NFA will raise an error if `align_using_pred_text=true` and there are existing `"pred_text"` fields in the original manifest.

> Note: NFA does not require `"duration"` fields in the manifest, and can align long audio files without running out of memory. Depending on your machine specs, you can align audios up to 5-10 minutes on Conformer CTC models, up to around 1.5 hours for QuartzNet models, and up to several hours for Citrinet models. NFA will also produce better alignments the more accurate the ground-truth `"text"` is.

# Output CTM file format
For each utterance specified in a line of `manifest_filepath`, several CTM files will be generated:
* a CTM file containing token-level alignments at `<output_dir>/tokens/<utt_id>.ctm`,
* a CTM file containing word-level alignments at `<output_dir>/words/<utt_id>.ctm`,
* if `additional_ctm_grouping_separator` is specified, there will also be a CTM file containing those segments at `output_dir/additional_segments`.
Each CTM file will contain lines of the format:
`<utt_id> 1 <start time in samples> <duration in samples> <text, ie token/word/segment>`.
Note the second item in the line (the 'channel ID', which is required by the CTM file format) is always 1, as NFA operates on single channel audio.

# Output JSON manifest file format
A new manifest file will be saved at `<output_dir>/<original manifest file name>_with_ctm_paths.json`. It will contain the same fields as the original manifest, and additionally:
* `"token_level_ctm_filepath"`
* `"word_level_ctm_filepath"`
* `"additonal_segment_level_ctm_filepath"` (if `additional_ctm_grouping_separator` is specified)
* `"pred_text"` (if `align_using_pred_text=true`)


# How do I evaluate the alignment accuracy?
Ideally you would have some 'true' CTM files to compare with your generated CTM files. With these you could obtain metrics such as the mean (absolute) errors between predicted starts/ends and the 'true' starts/ends of the segments.

Alternatively (or additionally), you can visualize the quality of alignments using tools such as Gecko, which can play your audio file and display the predicted alignments at the same time. The Gecko tool requires you to upload an audio file and at least one CTM file. The Gecko tool can be accessed here: https://gong-io.github.io/gecko/. More information about the Gecko tool can be found on its Github page here: https://github.com/gong-io/gecko.

**Note**: the following may help improve your experience viewing the CTMs in Gecko:
* setting `minimum_timestamp_duration` to a larger number, as Gecko may not display some tokens/words/segments properly if their timestamps are too short.
* setting `remove_blank_tokens_from_ctm=true` if you are analyzing token-level CTMs, as it will make the Gecko visualization less cluttered.
Loading

0 comments on commit fbfa799

Please sign in to comment.