Merge branch 'master' of https://github.com/NVIDIA/NeMo into jenkins_…

…test
NVIDIA · Feb 13, 2020 · 880c730 · 880c730
2 parents f8bd75a + f072029
commit 880c730
Show file tree

Hide file tree

Showing 120 changed files with 3,029 additions and 3,700 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -70,12 +70,16 @@ To release a new version, please update the changelog as followed:
 ## [Unreleased]
 
 ### Added
+- New Neural Type System and its tests.
+([PR #307](https://github.com/NVIDIA/NeMo/pull/307)) - @okuchaiev
 - Named tensors tuple module's output for graph construction.
 ([PR #268](https://github.com/NVIDIA/NeMo/pull/268)) - @stasbel
 - Introduced the `deprecated` decorator.
 ([PR #298](https://github.com/NVIDIA/NeMo/pull/298)) - @tkornuta-nvidia
 
 ### Changed
+- All collections changed to use New Neural Type System.
+([PR #307](https://github.com/NVIDIA/NeMo/pull/307)) - @okuchaiev
 - Additional Collections Repositories merged into core `nemo_toolkit` package.
 ([PR #289](https://github.com/NVIDIA/NeMo/pull/289)) - @DEKHTIARJonathan
 - Refactor manifest files parsing and processing for re-using.

diff --git a/Jenkinsfile b/Jenkinsfile
diff --git a/docs/docs_zh/sources/source/nlp/bert_pretraining.rst b/docs/docs_zh/sources/source/nlp/bert_pretraining.rst
@@ -5,7 +5,7 @@ BERT预训练
 
 创建一个专门领域的BERT模型对于某些应用是更有优势的。比如一个专门针对生物医学领域的专业BERT，类似于BioBERT :cite:`nlp-bert-lee2019biobert` 和SciBERT :cite:`nlp-bert-beltagy2019scibert` 。
 
-本教程中所使用的代码来自于 ``examples/nlp/bert_pretraining.py``.
+本教程中所使用的代码来自于 ``examples/nlp/language_modeling/bert_pretraining.py``.
 
 语料下载
 --------
@@ -51,10 +51,20 @@ BERT预训练
 
         # If you're using a custom vocabulary, create your tokenizer like this
         tokenizer = SentencePieceTokenizer(model_path="tokenizer.model")
-        tokenizer.add_special_tokens(["[MASK]", "[CLS]", "[SEP]"])
+        special_tokens = {
+            "sep_token": "[SEP]",
+            "pad_token": "[PAD]",
+            "bos_token": "[CLS]",
+            "mask_token": "[MASK]",
+            "eos_token": "[SEP]",
+            "cls_token": "[CLS]",
+        }
+        tokenizer.add_special_tokens(special_tokens)
 
         # Otherwise, create your tokenizer like this
         tokenizer = NemoBertTokenizer(vocab_file="vocab.txt")
+        # or
+        tokenizer = NemoBertTokenizer(pretrained_model="bert-base-uncased") 
 
 创建模型
 --------
@@ -78,76 +88,99 @@ BERT预训练
 
     .. code-block:: python
 
-        bert_model = nemo_nlp.huggingface.BERT(
-            vocab_size=tokenizer.vocab_size,
-            num_layers=args.num_layers,
-            d_model=args.d_model,
-            num_heads=args.num_heads,
-            d_inner=args.d_inner,
-            max_seq_length=args.max_seq_length,
-            hidden_act="gelu")
+        bert_model = nemo_nlp.nm.trainables.huggingface.BERT(
+            vocab_size=args.vocab_size,
+            num_hidden_layers=args.num_hidden_layers,
+            hidden_size=args.hidden_size,
+            num_attention_heads=args.num_attention_heads,
+            intermediate_size=args.intermediate_size,
+            max_position_embeddings=args.max_seq_length,
+            hidden_act=args.hidden_act)
 
 如果你想从一个已有的BERT模型文件继续训练，那设置一个模型的名字即可。如果想查看完整的预训练好的BERT模型列表，可以使用 `nemo_nlp.huggingface.BERT.list_pretrained_models()` 。
 
     .. code-block:: python
 
-        bert_model = nemo_nlp.huggingface.BERT(pretrained_model_name="bert-base-cased")
+        bert_model = nemo_nlp.nm.trainables.huggingface.BERT(pretrained_model_name="bert-base-cased")
 
 接下来，我们需要定义分类器和损失函数。在本教程中，我们会同时使用掩码语言模型和预测下一句模型这两个模型的损失函数，如果你只用掩饰语言模型作为损失的话，可能会观察到更高的准确率。
 
     .. code-block:: python
 
-        mlm_classifier = nemo_nlp.TokenClassifier(args.d_model,
+        mlm_classifier = nemo_nlp.nm.trainables.TokenClassifier(args.d_model,
                                                   num_classes=tokenizer.vocab_size,
                                                   num_layers=1,
                                                   log_softmax=True)
-        mlm_loss_fn = nemo_nlp.MaskedLanguageModelingLossNM()
+        mlm_loss_fn = nemo_nlp.nm.losses.MaskedLanguageModelingLossNM()
 
-        nsp_classifier = nemo_nlp.SequenceClassifier(args.d_model,
+        nsp_classifier = nemo_nlp.nm.trainables.SequenceClassifier(args.d_model,
                                                      num_classes=2,
                                                      num_layers=2,
                                                      log_softmax=True)
         nsp_loss_fn = nemo.backends.pytorch.common.CrossEntropyLoss()
 
-        bert_loss = nemo_nlp.LossAggregatorNM(num_inputs=2)
+        bert_loss = nemo_nlp.nm.losses.LossAggregatorNM(num_inputs=2)
 
 然后，我们把从输入到输出的整个计算流程封装成一个函数。有了这个函数，我们就可以很方便的分别创建训练流和评估流：
 
     .. code-block:: python
 
         def create_pipeline(**args):
-            dataset = nemo_nlp.BertPretrainingDataset(**params)
-            data_layer = nemo_nlp.BertPretrainingDataLayer(dataset)
-            steps_per_epoch = len(data_layer) // (batch_size * args.num_gpus)
-
-            input_ids, input_type_ids, input_mask, \
-                output_ids, output_mask, nsp_labels = data_layer()
-
-            hidden_states = bert_model(input_ids=input_ids,
-                                       token_type_ids=input_type_ids,
-                                       attention_mask=input_mask)
-
-            mlm_logits = mlm_classifier(hidden_states=hidden_states)
-            mlm_loss = mlm_loss_fn(logits=mlm_logits,
-                                   output_ids=output_ids,
-                                   output_mask=output_mask)
-
-            nsp_logits = nsp_classifier(hidden_states=hidden_states)
-            nsp_loss = nsp_loss_fn(logits=nsp_logits, labels=nsp_labels)
-
-            loss = bert_loss(loss_1=mlm_loss, loss_2=nsp_loss)
-
-            return loss, [mlm_loss, nsp_loss], steps_per_epoch
-
-
-        train_loss, _, steps_per_epoch = create_pipeline(data_desc.train_file,
-                                                         args.max_seq_length,
-                                                         args.mask_probability,
-                                                         args.batch_size)
-        eval_loss, eval_tensors, _ = create_pipeline(data_desc.eval_file,
-                                                     args.max_seq_length,
-                                                     args.mask_probability,
-                                                     args.eval_batch_size)
+                    data_layer = nemo_nlp.nm.data_layers.BertPretrainingDataLayer(
+                                            tokenizer,
+                                            data_file,
+                                            max_seq_length,
+                                            mask_probability,
+                                            short_seq_prob,
+                                            batch_size)
+                    # for preprocessed data
+                    # data_layer = nemo_nlp.BertPretrainingPreprocessedDataLayer(
+                    #        data_file,
+                    #        max_predictions_per_seq,
+                    #        batch_size, is_training)
+
+                    steps_per_epoch = len(data_layer) // (batch_size * args.num_gpus * args.batches_per_step)
+
+                    input_data = data_layer()
+
+                    hidden_states = bert_model(input_ids=input_data.input_ids,
+                                            token_type_ids=input_data.input_type_ids,
+                                            attention_mask=input_data.input_mask)
+
+                    mlm_logits = mlm_classifier(hidden_states=hidden_states)
+                    mlm_loss = mlm_loss_fn(logits=mlm_logits,
+                                        output_ids=input_data.output_ids,
+                                        output_mask=input_data.output_mask)
+
+                    nsp_logits = nsp_classifier(hidden_states=hidden_states)
+                    nsp_loss = nsp_loss_fn(logits=nsp_logits, labels=input_data.labels)
+
+                    loss = bert_loss(loss_1=mlm_loss, loss_2=nsp_loss)
+
+                    return loss, mlm_loss, nsp_loss, steps_per_epoch
+
+
+                train_loss, _, _, steps_per_epoch = create_pipeline(
+                                            data_file=data_desc.train_file,
+                                            preprocessed_data=False,
+                                            max_seq_length=args.max_seq_length,
+                                            mask_probability=args.mask_probability,
+                                            short_seq_prob=args.short_seq_prob,
+                                            batch_size=args.batch_size,
+                                            batches_per_step=args.batches_per_step)
+
+                # for preprocessed data 
+                # train_loss, _, _, steps_per_epoch = create_pipeline(
+                #                            data_file=args.data_dir,
+                #                            preprocessed_data=True,
+                #                            max_predictions_per_seq=args.max_predictions_per_seq,
+                #                            training=True,
+                #                            batch_size=args.batch_size,
+                #                            batches_per_step=args.batches_per_step)
+
+                eval_loss, eval_tensors, _ = create_pipeline(data_desc.eval_file,
+                                                            args.max_seq_length,
+                                            
 
 
 

diff --git a/docs/sources/source/nlp/asr-improvement.rst b/docs/sources/source/nlp/asr-improvement.rst
@@ -66,18 +66,17 @@ Then we define tokenizer to convert tokens into indices. We will use ``bert-base
 
     .. code-block:: python
 
-        tokenizer = NemoBertTokenizer(pretrained_model="bert-base-uncased")
+        tokenizer = nemo_nlp.data.NemoBertTokenizer(pretrained_model="bert-base-uncased")
 
 
 The encoder block is a neural module corresponding to BERT language model from
-``nemo_nlp.huggingface`` collection:
+``nemo_nlp.nm.trainables.huggingface`` collection:
 
     .. code-block:: python
 
         zeros_transform = nemo.backends.pytorch.common.ZerosLikeNM()
-        encoder = nemo_nlp.huggingface.BERT(
-            pretrained_model_name=args.pretrained_model,
-            local_rank=args.local_rank)
+        encoder = nemo_nlp.nm.trainables.huggingface.BERT(
+            pretrained_model_name=args.pretrained_model)
 
     .. tip::
         Making embedding size (as well as all other tensor dimensions) divisible
@@ -100,7 +99,7 @@ learn positional encodings ``"learn_positional_encodings": True``:
 
     .. code-block:: python
 
-        decoder = nemo_nlp.TransformerDecoderNM(
+        decoder = nemo_nlp.nm.trainables.TransformerDecoderNM(
             d_model=args.d_model,
             d_inner=args.d_inner,
             num_layers=args.num_layers,
@@ -123,7 +122,7 @@ To load the pretrained parameters into decoder, we use ``restore_from`` attribut
 Model training
 --------------
 
-To train the model run ``asr_postprocessor.py.py`` located in ``examples/nlp`` directory. We train with novograd optimizer :cite:`asr-imps-ginsburg2019stochastic`,
+To train the model run ``asr_postprocessor.py.py`` located in ``examples/nlp/asr_postprocessor`` directory. We train with novograd optimizer :cite:`asr-imps-ginsburg2019stochastic`,
 learning rate ``lr=0.001``, polynomial learning rate decay policy, ``1000`` warmup steps, per-gpu batch size of ``4096*8`` tokens, and ``0.25`` dropout probability.
 We trained on 8 GPUS. To launch the training in multi-gpu mode run the following command:
 

diff --git a/docs/sources/source/nlp/bert_pretraining.rst b/docs/sources/source/nlp/bert_pretraining.rst
@@ -4,7 +4,13 @@ Pretraining BERT
 In this tutorial, we will build and train a masked language model, either from scratch or from a pretrained BERT model, using the BERT architecture :cite:`nlp-bert-devlin2018bert`.
 Make sure you have ``nemo`` and ``nemo_nlp`` installed before starting this tutorial. See the :ref:`installation` section for more details.
 
-The code used in this tutorial can be found at ``examples/nlp/bert_pretraining.py``.
+The code used in this tutorial can be found at ``examples/nlp/language_modeling/bert_pretraining.py``.
+
+.. tip::
+    Pretrained BERT models can be found at 
+    `https://ngc.nvidia.com/catalog/models/nvidia:bertlargeuncasedfornemo <https://ngc.nvidia.com/catalog/models/nvidia:bertlargeuncasedfornemo>`__
+    `https://ngc.nvidia.com/catalog/models/nvidia:bertbaseuncasedfornemo <https://ngc.nvidia.com/catalog/models/nvidia:bertbaseuncasedfornemo>`__
+    `https://ngc.nvidia.com/catalog/models/nvidia:bertbasecasedfornemo <https://ngc.nvidia.com/catalog/models/nvidia:bertbasecasedfornemo>`__
 
 Introduction
 ------------
@@ -61,7 +67,7 @@ If have an available vocab, say the ``vocab.txt`` file from any `pretrained BERT
 
     .. code-block:: python
 
-        data_desc = BERTPretrainingDataDesc(args.dataset_name,
+        data_desc = nemo_nlp.data.BERTPretrainingDataDesc(args.dataset_name,
                                             args.data_dir,
                                             args.vocab_size,
                                             args.sample_size,
@@ -76,11 +82,14 @@ To train on a Chinese dataset, you should use `NemoBertTokenizer`.
     .. code-block:: python
 
         # If you're using a custom vocabulary, create your tokenizer like this
-        tokenizer = SentencePieceTokenizer(model_path="tokenizer.model")
-        tokenizer.add_special_tokens(["[MASK]", "[CLS]", "[SEP]"])
+        tokenizer = nemo_nlp.data.SentencePieceTokenizer(model_path="tokenizer.model")
+        special_tokens = nemo_nlp.utils.MODEL_SPECIAL_TOKENS['bert']
+        tokenizer.add_special_tokens(special_tokens)
 
         # Otherwise, create your tokenizer like this
-        tokenizer = NemoBertTokenizer(vocab_file="vocab.txt")
+        tokenizer = nemo_nlp.data.NemoBertTokenizer(vocab_file="vocab.txt")
+        # or
+        tokenizer = nemo_nlp.data.NemoBertTokenizer(pretrained_model="bert-base-uncased") 
 
 Create the model
 ----------------
@@ -105,7 +114,7 @@ We also need to define the BERT model that we will be pre-training. Here, you ca
 
     .. code-block:: python
 
-        bert_model = nemo_nlp.huggingface.BERT(
+        bert_model = nemo_nlp.nm.trainables.huggingface.BERT(
             vocab_size=args.vocab_size,
             num_hidden_layers=args.num_hidden_layers,
             hidden_size=args.hidden_size,
@@ -126,22 +135,22 @@ For the full list of BERT model names, check out `nemo_nlp.huggingface.BERT.list
 
     .. code-block:: python
 
-        bert_model = nemo_nlp.huggingface.BERT(pretrained_model_name="bert-base-cased")
+        bert_model = nemo_nlp.nm.trainables.huggingface.BERT(pretrained_model_name="bert-base-cased")
 
 Next, we will define our classifier and loss functions. We will demonstrate how to pre-train with both MLM (masked language model) and NSP (next sentence prediction) losses,
 but you may observe higher downstream accuracy by only pre-training with MLM loss.
 
     .. code-block:: python
 
-        mlm_classifier = nemo_nlp.BertTokenClassifier(
+        mlm_classifier = nemo_nlp.nm.trainables.BertTokenClassifier(
                                     args.hidden_size,
                                     num_classes=args.vocab_size,
                                     activation=ACT2FN[args.hidden_act],
                                     log_softmax=True)
 
-        mlm_loss_fn = nemo_nlp.MaskedLanguageModelingLossNM()
+        mlm_loss_fn = nemo_nlp.nm.losses.MaskedLanguageModelingLossNM()
 
-        nsp_classifier = nemo_nlp.SequenceClassifier(
+        nsp_classifier = nemo_nlp.nm.trainables.SequenceClassifier(
                                                 args.hidden_size,
                                                 num_classes=2,
                                                 num_layers=2,
@@ -150,7 +159,7 @@ but you may observe higher downstream accuracy by only pre-training with MLM los
 
         nsp_loss_fn = nemo.backends.pytorch.common.CrossEntropyLoss()
 
-        bert_loss = nemo_nlp.LossAggregatorNM(num_inputs=2)
+        bert_loss = nemo_nlp.nm.losses.LossAggregatorNM(num_inputs=2)
 
 Then, we create the pipeline from input to output that can be used for both training and evaluation:
 
@@ -159,7 +168,7 @@ For training from raw text use nemo_nlp.BertPretrainingDataLayer, for preprocess
     .. code-block:: python
 
         def create_pipeline(**args):
-            data_layer = nemo_nlp.BertPretrainingDataLayer(
+            data_layer = nemo_nlp.nm.data_layers.BertPretrainingDataLayer(
                                     tokenizer,
                                     data_file,
                                     max_seq_length,
@@ -174,20 +183,19 @@ For training from raw text use nemo_nlp.BertPretrainingDataLayer, for preprocess
 
             steps_per_epoch = len(data_layer) // (batch_size * args.num_gpus * args.batches_per_step)
 
-            input_ids, input_type_ids, input_mask, \
-                output_ids, output_mask, nsp_labels = data_layer()
+            input_data = data_layer()
 
-            hidden_states = bert_model(input_ids=input_ids,
-                                       token_type_ids=input_type_ids,
-                                       attention_mask=input_mask)
+            hidden_states = bert_model(input_ids=input_data.input_ids,
+                                       token_type_ids=input_data.input_type_ids,
+                                       attention_mask=input_data.input_mask)
 
             mlm_logits = mlm_classifier(hidden_states=hidden_states)
             mlm_loss = mlm_loss_fn(logits=mlm_logits,
-                                   output_ids=output_ids,
-                                   output_mask=output_mask)
+                                   output_ids=input_data.output_ids,
+                                   output_mask=input_data.output_mask)
 
             nsp_logits = nsp_classifier(hidden_states=hidden_states)
-            nsp_loss = nsp_loss_fn(logits=nsp_logits, labels=nsp_labels)
+            nsp_loss = nsp_loss_fn(logits=nsp_logits, labels=input_data.labels)
 
             loss = bert_loss(loss_1=mlm_loss, loss_2=nsp_loss)
 

diff --git a/docs/sources/source/nlp/joint_intent_slot_filling.rst b/docs/sources/source/nlp/joint_intent_slot_filling.rst
@@ -9,6 +9,10 @@ There are four pre-trained BERT models that we can select from using the argumen
 using the script for loading pre-trained models from `pytorch_transformers`. See the list of available pre-trained models
 `here <https://huggingface.co/pytorch-transformers/pretrained_models.html>`__. 
 
+.. tip::
+
+    For pretraining BERT in NeMo and pretrained model checkpoints go to `BERT pretraining <https://nvidia.github.io/NeMo/nlp/bert_pretraining.html>`__.
+
 
 Preliminaries
 -------------

diff --git a/docs/sources/source/nlp/ner.rst b/docs/sources/source/nlp/ner.rst
@@ -4,6 +4,12 @@ Tutorial
 Make sure you have ``nemo`` and ``nemo_nlp`` installed before starting this
 tutorial. See the :ref:`installation` section for more details.
 
+.. tip::
+
+    For pretraining BERT in NeMo and pretrained model checkpoints go to `BERT pretraining <https://nvidia.github.io/NeMo/nlp/bert_pretraining.html>`__.
+
+
+
 Introduction
 ------------