Fixing Docs for Text Classification task (including sentiment analysi…

…s) (#675) * git history clean up Signed-off-by: Evelina Bakhturina <[email protected]> * nlp references to the tutotials Signed-off-by: Evelina Bakhturina <[email protected]> * sphinx fix Signed-off-by: Evelina Bakhturina <[email protected]> * review feedback Signed-off-by: Evelina Bakhturina <[email protected]>
NVIDIA · Jun 4, 2020 · 1273aff · 1273aff
1 parent 7efa348
commit 1273aff
Show file tree

Hide file tree

Showing 21 changed files with 2,157 additions and 938 deletions.
diff --git a/docs/sources/source/nlp/asr-improvement.rst b/docs/sources/source/nlp/asr-improvement.rst
@@ -1,3 +1,5 @@
+.. _asr_improvement:
+
 Tutorial
 ========
 

diff --git a/docs/sources/source/nlp/bert_pretraining.rst b/docs/sources/source/nlp/bert_pretraining.rst
@@ -1,7 +1,7 @@
+.. _bert_pretraining:
 
-
-Tutorial
-========
+BERT Pre-training Tutorial
+==========================
 
 In this tutorial, we will build and train a masked language model, either from scratch or from a pretrained BERT model, using the BERT architecture :cite:`nlp-bert-devlin2018bert`.
 Make sure you have ``nemo`` and ``nemo_nlp`` installed before starting this tutorial. See the :ref:`installation` section for more details.

diff --git a/docs/sources/source/nlp/dialogue_state_tracking.rst b/docs/sources/source/nlp/dialogue_state_tracking.rst
@@ -1,3 +1,5 @@
+.. _trade_tutorial:
+
 TRADE Tutorial
 ==============
 
@@ -265,6 +267,7 @@ References
     :keyprefix: nlp-dst-
 
 
+.. _sgd_tutorial:
 
 SGD Tutorial
 ============

diff --git a/docs/sources/source/nlp/glue.rst b/docs/sources/source/nlp/glue.rst
@@ -1,3 +1,4 @@
+.. _glue:
 
 Tutorial
 ========
@@ -86,7 +87,7 @@ To use multi-gpu training on MNLI task, run:
 
         export NUM_GPUS=4
         python -m torch.distributed.launch --nproc_per_node=$NUM_GPUS glue_benchmark_with_bert.py \
-            --data_dir=/path_to_data/MNLI \
+            --data_dir=/path_to_data_dir/MNLI \
             --task_name mnli \
             --work_dir /path_to_output_folder \
             --num_gpus=$NUM_GPUS \

diff --git a/docs/sources/source/nlp/intro.rst b/docs/sources/source/nlp/intro.rst
@@ -5,17 +5,32 @@ Natural Language Processing
 
 Supported Tasks and Models:
 
+* Neural Machine Translation
+   * :ref:`nmt`
+* Language Modelling:
+   * :ref:`bert_pretraining`
+   * :ref:`transformer_lm`
+   * :ref:`megatron_finetuning`
+* GLUE Benchmark
+   * :ref:`glue`
 * Intent Detection and Slot Filling
+   * :ref:`joint_intent_slot_filling`
 * Text Classification
-* State Tracking for Task-oriented Dialogue Systems
-* Language Modelling
-* Neural Machine Translation
-* Question Answering
+   * :ref:`text_classification`
+   * :ref:`sentiment_analysis`
 * Name Entity Recognition (NER)
+   * :ref:`ner`
 * Punctuation and Capitalization
-* GLUE Benchmark
+   * :ref:`punctuation`
+* Question Answering
+   * :ref:`squad_model_links`
+* State Tracking for Goal-oriented Dialogue Systems:
+   * :ref:`trade_tutorial`
+   * :ref:`sgd_tutorial`
 * ASR Postprocessing with BERT
+   * :ref:`asr_improvement`
 
+
 All examples from NLP collection can be found `here <https://github.com/NVIDIA/NeMo/tree/master/examples/nlp>`__.
 
 Neural Machine Translation (NMT)
@@ -32,19 +47,19 @@ Pretraining BERT
 
    bert_pretraining
 
-Megatron-LM for Downstream tasks
---------------------------------
+Transformer Language Model
+--------------------------
 .. toctree::
    :maxdepth: 8
 
-   megatron_finetuning
+   transformer_language_model
 
-Transformer Language Model
---------------------------
+Megatron-LM for Downstream tasks
+--------------------------------
 .. toctree::
    :maxdepth: 8
 
-   transformer_language_model
+   megatron_finetuning
 
 GLUE Benchmark
 --------------------------
@@ -53,14 +68,19 @@ GLUE Benchmark
 
    glue
 
-Dialogue State Tracking
+Intent and Slot filling
 -----------------------
-
 .. toctree::
    :maxdepth: 8
 
-   dialogue_state_tracking.rst
+   joint_intent_slot_filling
+
+Text Classification
+-------------------
+.. toctree::
+   :maxdepth: 8
 
+   text_classification
 
 Named Entity Recognition
 ------------------------
@@ -78,25 +98,23 @@ Punctuation and Word Capitalization
 
    punctuation
 
-
-Intent and Slot filling
------------------------
+Question Answering 
+------------------
 .. toctree::
    :maxdepth: 8
 
-   joint_intent_slot_filling
-
+   question_answering
 
+Dialogue State Tracking
+-----------------------
 
-Question Answering 
-------------------
 .. toctree::
    :maxdepth: 8
 
-   question_answering
+   dialogue_state_tracking
 
-Improving Speech Recognition with BERTx2 Post-processing Model
---------------------------------------------------------------
+ASR Postprocessing with BERT
+----------------------------
 .. toctree::
    :maxdepth: 8
 

diff --git a/docs/sources/source/nlp/joint_intent_slot_filling.rst b/docs/sources/source/nlp/joint_intent_slot_filling.rst
@@ -1,3 +1,5 @@
+.. _joint_intent_slot_filling:
+
 Tutorial
 ========
 

diff --git a/docs/sources/source/nlp/megatron_finetuning.rst b/docs/sources/source/nlp/megatron_finetuning.rst
@@ -1,3 +1,5 @@
+.. _megatron_finetuning:
+
 Megatron-LM for Downstream Tasks
 ================================
 

diff --git a/docs/sources/source/nlp/ner.rst b/docs/sources/source/nlp/ner.rst
@@ -1,3 +1,5 @@
+.. _ner:
+
 Tutorial
 ========
 

diff --git a/docs/sources/source/nlp/neural_machine_translation.rst b/docs/sources/source/nlp/neural_machine_translation.rst
@@ -1,3 +1,5 @@
+.. _nmt:
+
 Tutorial
 ========
 

diff --git a/docs/sources/source/nlp/punctuation.rst b/docs/sources/source/nlp/punctuation.rst
@@ -1,3 +1,5 @@
+.. _punctuation:
+
 Tutorial
 ========
 

diff --git a/docs/sources/source/nlp/text_classification.rst b/docs/sources/source/nlp/text_classification.rst
@@ -0,0 +1,112 @@
+.. _text_classification:
+
+Tutorial
+========
+
+In this tutorial, we are going to describe how to finetune a BERT-like model \
+based on `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ :cite:`nlp-tc-devlin2018bert` \
+on a text classification task. 
+
+Task Description
+----------------
+
+Text classification is the task of assigning a predefined label to a given text based on its content. 
+The text classification task applies to a broad range of problems: sentiment analysis, spam detection, intent detection, and many others.
+
+
+Data Format
+-----------
+
+For the text classification task, NeMo requires the following format:
+
+- the first line of each data file should contain a header with columns ``sentence`` and ``label``
+- all subsequent lines in the file should contain some text in the first column and numerical label in the second column
+- the columns are separated with tab
+
+.. code-block::
+
+    sentence [TAB] label
+    text [TAB]  label_id
+    text [TAB]  label_id
+    text [TAB]  label_id
+
+For example, that final data file could look like this:
+
+.. code-block::
+
+    sentence label
+    the first sentence  0
+    the second sentence 1
+    the third sentence  2 
+
+By default, the training script assumes that the training data is locating under the specified \
+``--data_dir PATH_TO_DATA`` in ``train.tsv`` file, and evaluation file in ``dev.tsv`` file. 
+Use ``--train_file_prefix`` and ``--eval_file_prefix`` to change the default names.
+
+NeMo provides a conversion script from the original data format to the NeMo format \
+for some of the well-known datasets including SST-2 and IMDB, see 
+`examples/nlp/text_classification/data/import_datasets.py <https://github.com/NVIDIA/NeMo/blob/master/examples/nlp/text_classification/data/import_datasets.py>`_ for details.
+
+Model training
+--------------
+
+The code used in this tutorial is based on `examples/nlp/text_classification/text_classification_with_bert.py <https://github.com/NVIDIA/NeMo/blob/master/examples/nlp/text_classification/text_classification_with_bert.py>`_.
+
+.. note::
+
+    The script supports multi-class tasks
+
+To run the script on a single GPU, run:
+
+    .. code-block:: bash
+
+        python text_classification_with_bert.py  \
+            --data_dir /path_to_data_dir \
+            --work_dir /path_to_output_folder 
+            
+To use multi-gpu training on this task, run:
+
+    .. code-block:: bash
+
+        export NUM_GPUS=4
+        python -m torch.distributed.launch --nproc_per_node=$NUM_GPUS text_classification_with_bert.py \
+            --data_dir=/path_to_data_dir \
+            --work_dir /path_to_output_folder \
+            --num_gpus=$NUM_GPUS 
+
+More details about multi-gpu training could be found in the `Fast Training <https://nvidia.github.io/NeMo/training.html>`_ section.
+
+For additional model training parameters, please see ``examples/nlp/text_classification_with_bert.py``.
+
+Evaluating Checkpoints
+----------------------
+
+During training, the model is evaluated after every epoch and by default a folder named ``checkpoints`` would be created under the working folder specified by `--work_dir` and \
+checkpoints would be stored there. To evaluate a pre-trained checkpoint on a dev set, \
+run the same training script by passing ``--checkpoint_dir`` and setting ``--num_epochs`` as zero to avoid the training.
+
+.. code-block:: bash
+
+    python text_classification_with_bert.py  \
+        --data_dir /path_to_data_dir/ \
+        --work_dir /path_to_output_folder \
+        --checkpoint_dir /path_to_output_folder/checkpoints \
+        --num_epochs 0
+
+
+.. _sentiment_analysis:
+
+Sentiment Analysis with BERT
+============================
+
+Tutorial on how to finetune a BERT model on Sentiment Analysis task, could be found at
+`examples/nlp/text_classification/sentiment_analysis_with_bert.ipynb <https://github.com/NVIDIA/NeMo/blob/master/examples/nlp/text_classification/sentiment_analysis_with_bert.ipynb>`_
+
+
+References
+----------
+
+.. bibliography:: nlp_all_refs.bib
+    :style: plain
+    :labelprefix: NLP-TC
+    :keyprefix: nlp-tc-
diff --git a/docs/sources/source/nlp/transformer_language_model.rst b/docs/sources/source/nlp/transformer_language_model.rst
@@ -1,5 +1,7 @@
-Tutorial
-========
+.. _transformer_lm:
+
+Transformer Language Model Tutorial
+===================================
 
 In this tutorial, we will build and train a language model using the Transformer architecture :cite:`nlp-lm-vaswani2017attention`.
 Make sure you have ``nemo`` and ``nemo_nlp`` installed before starting this tutorial. See the :ref:`installation` section for more details.

diff --git a/examples/nlp/glue_benchmark/glue_benchmark_with_bert.py b/examples/nlp/glue_benchmark/glue_benchmark_with_bert.py
@@ -120,7 +120,7 @@
     choices=["nemobert", "sentencepiece"],
     help="tokenizer to use, only relevant when using custom pretrained checkpoint.",
 )
-parser.add_argument("--vocab_file", default=None, help="Path to the vocab file.")
+parser.add_argument("--vocab_file", default=None, type=str, help="Path to the vocab file.")
 parser.add_argument(
     "--do_lower_case",
     action='store_true',
@@ -136,10 +136,10 @@
                     truncated, sequences shorter will be padded.",
 )
 parser.add_argument("--optimizer_kind", default="adam", type=str, help="Optimizer kind")
-parser.add_argument("--lr_policy", default="WarmupAnnealing", type=str)
+parser.add_argument("--lr_policy", default="WarmupAnnealing", type=str, help="Learning rate policy")
 parser.add_argument("--lr", default=5e-5, type=float, help="The initial learning rate.")
-parser.add_argument("--lr_warmup_proportion", default=0.1, type=float)
-parser.add_argument("--weight_decay", default=0.0, type=float, help="Weight deay if we apply some.")
+parser.add_argument("--lr_warmup_proportion", default=0.1, type=float, help="Learning rate warm up proportion")
+parser.add_argument("--weight_decay", default=0.0, type=float, help="Weight decay if we apply some.")
 parser.add_argument("--num_epochs", default=3, type=int, help="Total number of training epochs to perform.")
 parser.add_argument("--batch_size", default=8, type=int, help="Batch size per GPU/CPU for training/evaluation.")
 parser.add_argument("--num_gpus", default=1, type=int, help="Number of GPUs")

diff --git a/examples/nlp/question_answering/question_answering_squad.py b/examples/nlp/question_answering/question_answering_squad.py
@@ -140,7 +140,7 @@ def parse_args():
         help="tokenizer to use, only relevant when using custom pretrained checkpoint.",
     )
     parser.add_argument("--optimizer", default="adam_w", type=str, help="Optimizer kind")
-    parser.add_argument("--vocab_file", default=None, help="Path to the vocab file.")
+    parser.add_argument("--vocab_file", default=None, type=str, help="Path to the vocab file.")
     parser.add_argument("--lr_policy", default="WarmupAnnealing", type=str)
     parser.add_argument("--lr", default=3e-5, type=float, help="The initial learning rate.")
     parser.add_argument("--lr_warmup_proportion", default=0.0, type=float)