small bugfix for r1.13.0 (#5310)

* typo fix Signed-off-by: fayejf <[email protected]> * udpate transcribe Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]>
NVIDIA · Nov 4, 2022 · 42f6ac9 · 42f6ac9
1 parent 26e3e1d
commit 42f6ac9
Show file tree

Hide file tree

Showing 3 changed files with 7 additions and 6 deletions.
diff --git a/examples/asr/transcribe_speech.py b/examples/asr/transcribe_speech.py
@@ -244,6 +244,7 @@ def autocast():
                         path2manifest=cfg.dataset_manifest,
                         batch_size=cfg.batch_size,
                         num_workers=cfg.num_workers,
+                        return_hypotheses=return_hypotheses,
                     )
                 else:
                     logging.warning(

diff --git a/nemo/collections/asr/parts/utils/transcribe_utils.py b/nemo/collections/asr/parts/utils/transcribe_utils.py
@@ -74,16 +74,16 @@ def transcribe_partial_audio(
                     lg = logits[idx][: logits_len[idx]]
                     hypotheses.append(lg.cpu().numpy())
             else:
-                current_hypotheses, _ = asr_model._wer.decoding.ctc_decoder_predictions_tensor(
-                    decoder_outputs=greedy_predictions,
-                    decoder_lengths=logits_len,
-                    return_hypotheses=return_hypotheses,
+                current_hypotheses, all_hyp = asr_model.decoding.ctc_decoder_predictions_tensor(
+                    logits, decoder_lengths=logits_len, return_hypotheses=return_hypotheses,
                 )
 
                 if return_hypotheses:
                     # dump log probs per file
                     for idx in range(logits.shape[0]):
                         current_hypotheses[idx].y_sequence = logits[idx][: logits_len[idx]]
+                        if current_hypotheses[idx].alignments is None:
+                            current_hypotheses[idx].alignments = current_hypotheses[idx].y_sequence
 
                 hypotheses += current_hypotheses
 

diff --git a/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb b/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb
@@ -197,7 +197,7 @@
     "\n",
     "- Please skip this section and go directly to [Prepare Training data for MSDD](#Prepare-Training-data-for-MSDD) section if you have your own speaker diarization dataset. \n",
     "\n",
-    "In this tutorial, we use [NeMo Multispeaker Simulator](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Multispeaker_Simulator.ipynb) and the Librispeech corpus to generate a toy training dataset for demonstration purpose. You can replace the simulated dataset with your own datasets if you have proper speaker annotations (RTTM files) for the dataset. If you do not have access to any speaker diarization datasets, you can use NeMo [NeMo Multispeaker Simulator](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Multispeaker_Simulator.ipynb) by generating a good amount of data samples to meet your needs. \n",
+    "In this tutorial, we use [NeMo Multispeaker Simulator](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Multispeaker_Simulator.ipynb) and the Librispeech corpus to generate a toy training dataset for demonstration purpose. You can replace the simulated dataset with your own datasets if you have proper speaker annotations (RTTM files) for the dataset. If you do not have access to any speaker diarization datasets, you can use [NeMo Multispeaker Simulator](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Multispeaker_Simulator.ipynb) by generating a good amount of data samples to meet your needs. \n",
     "\n",
     "For more details regarding data simulator, please follow the descriptions in [NeMo Multispeaker Simulator](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Multispeaker_Simulator.ipynb) and we will not cover configurations and detailed process of data simulation in this tutorial. \n"
    ]
@@ -599,7 +599,7 @@
     "\n",
     "Before we generate a manifest file and RTTM files for training MSDD, you have to determine:\n",
     "\n",
-    "- `window`: the windowl length of the base scale (the shortest scale)\n",
+    "- `window`: the window length of the base scale (the shortest scale)\n",
     "- `shift`: the hop-length of the base scale (the shortest scale)\n",
     "- `step_count`: how many decision steps in one data sample\n",
     "\n",