Pcla tutorial fixes (NVIDIA#5271) (NVIDIA#5273)

* Fixed typos Signed-off-by: Matvei Novikov <[email protected]> * Fixed cell type and tatoeba reference Signed-off-by: Matvei Novikov <[email protected]> * Fixed typo Signed-off-by: Matvei Novikov <[email protected]> * Fixed branch variable Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> Co-authored-by: Matvei Novikov <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
hainan-xv · Nov 29, 2022 · 7ea98b0 · 7ea98b0
1 parent 37f36da
commit 7ea98b0
Showing 1 changed file with 12 additions and 11 deletions.
diff --git a/tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb b/tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb
@@ -99,7 +99,7 @@
     "- whether the word should be capitalized\n",
     "\n",
     "\n",
-    "In some cases lexical only model can't predict punctutation correctly without audio. It is especially hard for conversational speech.\n",
+    "In some cases lexical only model can't predict punctuation correctly without audio. It is especially hard for conversational speech.\n",
     "\n",
     "For example:\n",
     "\n",
@@ -119,7 +119,7 @@
     "## Architecture\n",
     "Punctuation and capitaalization lexical audio model is based on [Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech](https://arxiv.org/pdf/2008.00702.pdf). Model consists of lexical encoder (BERT-like model), acoustic encoder (i.e. Conformer's audio encoder), fusion of lexical and audio features (attention based fusion) and prediction layers.\n",
     "\n",
-    "Fusion is needed because encoded text and audio might have different length therfore can't be alligned one-to-one. As model predicts punctuation and capitalization per text token we use cross-attention between encoded lexical and encoded audio input."
+    "Fusion is needed because encoded text and audio might have different length therefore can't be aligned one-to-one. As model predicts punctuation and capitalization per text token we use cross-attention between encoded lexical and encoded audio input."
    ]
   },
   {
@@ -279,22 +279,23 @@
    ]
   },
   {
-   "cell_type": "markdown",
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "outputs": [],
    "source": [
-    "## download get_tatoeba_data.py script to download and preprocess the Tatoeba data\n",
+    "## download get_libritts_data.py script to download and preprocess the LibriTTS data\n",
     "os.makedirs(WORK_DIR, exist_ok=True)\n",
     "if not os.path.exists(WORK_DIR + '/get_libritts_data.py'):\n",
     "    print('Downloading get_libritts_data.py...')\n",
     "    wget.download(f'https://raw.githubusercontent.com/NVIDIA/NeMo/{BRANCH}/examples/nlp/token_classification/data/get_libritts_data.py', WORK_DIR)\n",
     "else:\n",
     "    print ('get_libritts_data.py already exists')"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "pycharm": {
-     "name": "#%% md\n"
-    }
-   }
+   ]
   },
   {
    "cell_type": "code",