Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nlp refactoring #316

Merged
merged 58 commits into from
Feb 4, 2020
Merged
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
dfdacaa
init commit of nlp refactoring
yzhang123 Jan 25, 2020
889a2eb
fixed import errors
yzhang123 Jan 27, 2020
a14da52
make absolute imports
yzhang123 Jan 27, 2020
ffcd7fb
fix import error
yzhang123 Jan 27, 2020
52d8a70
fix imports
yzhang123 Jan 28, 2020
ff7774b
rebase master
yzhang123 Jan 28, 2020
fdc421b
add all changed nlp files
VahidooX Jan 31, 2020
4f81260
Updated thw whole test folder.
VahidooX Jan 31, 2020
0864e65
Changed nemo.logging to logging
VahidooX Jan 31, 2020
6df0778
Added transformer to the init
VahidooX Jan 31, 2020
44bd381
Fixed lgtm warnings.
VahidooX Jan 31, 2020
b3c1513
Fixed transformer package.
VahidooX Jan 31, 2020
8c58796
Fixed unused local variables.
VahidooX Jan 31, 2020
c5127a1
Fixed lgtm.
VahidooX Jan 31, 2020
ae83cff
Fixed lgtm.
VahidooX Jan 31, 2020
b851473
Fixed logging in examples.
VahidooX Jan 31, 2020
2137704
Merge remote-tracking branch 'remote/master' into nlp_refactoring_tmp
VahidooX Jan 31, 2020
b8f57bf
Moved __all__ after imports. Added more __all__:)
VahidooX Jan 31, 2020
58715fe
Added license to all the files except examples.
VahidooX Feb 1, 2020
a92ea9c
Added license to all examples.
VahidooX Feb 1, 2020
22de233
Fixed style.
VahidooX Feb 1, 2020
8590949
Fixed style.
VahidooX Feb 1, 2020
3bb2035
Updated examples names.
VahidooX Feb 1, 2020
30b7d44
Added licenses to init files.
VahidooX Feb 1, 2020
7a66bec
tested examples
yzhang123 Feb 3, 2020
ba24d5a
fix black style
yzhang123 Feb 3, 2020
8857fc5
updating jenkins after script renaming
yzhang123 Feb 3, 2020
c5a441f
updating changelog
yzhang123 Feb 3, 2020
b2e2475
merged dev-config-nm with nlp_refactor_tmp, all unit tests passed
tkornuta-nvidia Feb 3, 2020
03e02cd
import fixed
ekmb Feb 3, 2020
0d204d6
Moved scripts.
VahidooX Feb 3, 2020
b4349e5
Merge remote-tracking branch 'remote/nlp_refactoring_merged_config' i…
VahidooX Feb 3, 2020
9356e59
Fixed import.
VahidooX Feb 3, 2020
2cf27b8
tested examples scripts
yzhang123 Feb 3, 2020
1afe049
update jenkins
yzhang123 Feb 3, 2020
0363044
Merge branch 'nlp_refactoring_tmp' of https://github.com/NVIDIA/NeMo …
VahidooX Feb 3, 2020
4446f80
merge conflict on CHANGELOG resolved
tkornuta-nvidia Feb 4, 2020
efdb87b
Merge branch 'master' of github.com:NVIDIA/NeMo into merge-nlp-refact…
tkornuta-nvidia Feb 4, 2020
4e81846
LGTM fixes
tkornuta-nvidia Feb 4, 2020
cb11882
removed invalid argument in ipynb
tkornuta-nvidia Feb 4, 2020
4a9343d
removed unused import
tkornuta-nvidia Feb 4, 2020
4a159b4
removed empty nontrainables, as agreed during the meeting, if there a…
tkornuta-nvidia Feb 4, 2020
ecfd2cd
removed nontrainables reference
tkornuta-nvidia Feb 4, 2020
face974
nemo_core - lgtm import fix
tkornuta-nvidia Feb 4, 2020
e59fcdb
removed references to local_params - n-th time:]
tkornuta-nvidia Feb 4, 2020
720f5c5
Cleanups related to removing factory from BERT init calls
tkornuta-nvidia Feb 4, 2020
04a083b
import fixes, nm default shuffleFalse
ekmb Feb 4, 2020
e8e9770
shuffle fixed
ekmb Feb 4, 2020
f555326
fix shuffle
yzhang123 Feb 4, 2020
17dbaf6
Merge branch 'nlp_refactoring_tmp' of https://github.com/NVIDIA/NeMo …
yzhang123 Feb 4, 2020
1852d59
default shFalse in text_dl
ekmb Feb 4, 2020
42b4ecc
Shuffle bug fixed.
VahidooX Feb 4, 2020
46087d1
Merge branch 'nlp_refactoring_tmp' of https://github.com/NVIDIA/NeMo …
VahidooX Feb 4, 2020
5b4a75e
nmt fix
ekmb Feb 4, 2020
21f353b
Merge branch 'nlp_refactoring_tmp' of https://github.com/NVIDIA/NeMo …
ekmb Feb 4, 2020
e7853bf
fix squad unittest
yzhang123 Feb 4, 2020
cc674c9
Merge branch 'nlp_refactoring_tmp' of https://github.com/NVIDIA/NeMo …
yzhang123 Feb 4, 2020
4cbcfd4
Set shuffles to False in some data layers.
VahidooX Feb 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Moved __all__ after imports. Added more __all__:)
Signed-off-by: VahidooX <[email protected]>
  • Loading branch information
VahidooX committed Jan 31, 2020
commit b8f57bfb53e47243fe769d428c74a8fbec2ba0ea
1 change: 0 additions & 1 deletion nemo/collections/nlp/__init__.py
Original file line number Diff line number Diff line change
@@ -17,6 +17,5 @@
import nemo.collections.nlp.data
import nemo.collections.nlp.nm
import nemo.collections.nlp.utils
from nemo import logging

backend = nemo.core.Backend.PyTorch
4 changes: 2 additions & 2 deletions nemo/collections/nlp/callbacks/glue_benchmark_callback.py
Original file line number Diff line number Diff line change
@@ -19,8 +19,6 @@
Some transformer of this code were adapted from the HuggingFace library at
https://github.com/huggingface/transformers
"""
__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']

import os
import random

@@ -31,6 +29,8 @@
from nemo import logging
from nemo.collections.nlp.utils.callback_utils import list2str, tensor2list

__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']


def eval_iter_callback(tensors, global_vars):
if "all_preds" not in global_vars.keys():
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# Copyright (c) 2019 NVIDIA Corporation

import random

import numpy as np
4 changes: 2 additions & 2 deletions nemo/collections/nlp/callbacks/lm_bert_callback.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Copyright (c) 2019 NVIDIA Corporation
__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']

import numpy as np

from nemo import logging

__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']


def eval_iter_callback(tensors, global_vars):
if "dev_mlm_loss" not in global_vars.keys():
3 changes: 1 addition & 2 deletions nemo/collections/nlp/callbacks/lm_transformer_callback.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
# Copyright (c) 2019 NVIDIA Corporation
__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']

import numpy as np

from nemo import logging

__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']
GLOBAL_KEYS = ["eval_loss", "sys"]


Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Copyright (c) 2019 NVIDIA Corporation
__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']

import numpy as np

from nemo import logging
from nemo.collections.asr.metrics import word_error_rate
from nemo.collections.nlp.metrics.sacrebleu import corpus_bleu

__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']

GLOBAL_KEYS = ["eval_loss", "ref", "sys", "sent_ids", "nonpad_tokens"]


Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# Copyright (c) 2019 NVIDIA Corporation
__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']

import random

import numpy as np
@@ -9,6 +7,8 @@
from nemo import logging
from nemo.collections.nlp.utils.callback_utils import list2str, plot_confusion_matrix, tensor2list

__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']


def eval_iter_callback(tensors, global_vars):
if "punct_all_preds" not in global_vars.keys():
4 changes: 2 additions & 2 deletions nemo/collections/nlp/callbacks/qa_squad_callback.py
Original file line number Diff line number Diff line change
@@ -14,10 +14,10 @@
limitations under the License.
"""

__all__ = ['eval_epochs_done_callback', 'eval_iter_callback']

from nemo import logging

__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']


def eval_iter_callback(tensors, global_vars):
if "eval_start_logits" not in global_vars.keys():
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# Copyright (c) 2019 NVIDIA Corporation
__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']

import random

import numpy as np
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# Copyright (c) 2019 NVIDIA Corporation
__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']

import random

import numpy as np
@@ -9,6 +7,8 @@
from nemo import logging
from nemo.collections.nlp.utils.callback_utils import list2str, plot_confusion_matrix, tensor2list

__all__ = ['eval_iter_callback', 'eval_epochs_done_callback']


def eval_iter_callback(tensors, global_vars):
if "all_preds" not in global_vars.keys():
36 changes: 36 additions & 0 deletions nemo/collections/nlp/data/datasets/datasets_utils.py
Original file line number Diff line number Diff line change
@@ -22,6 +22,42 @@
write_vocab_in_order,
)

__all__ = [
'get_label_stats',
'process_sst_2',
'process_imdb',
'process_thucnews',
'process_nlu',
'process_twitter_airline',
'process_atis',
'process_jarvis_datasets',
'process_mturk',
'process_intent_slot_mturk',
'get_intents_mturk',
'get_slot_labels',
'merge',
'get_intent_query_files_dialogflow',
'get_intents_slots_dialogflow',
'get_slots_dialogflow',
'partition_data',
'write_files',
'process_dialogflow',
'write_data',
'create_dataset',
'read_csv',
'process_snips',
'get_dataset',
'partition',
'map_entities',
'get_entities',
'get_data',
'reverse_dict',
'get_intent_labels',
'download_wkt2',
'normalize_answer',
'get_tokens',
]

DATABASE_EXISTS_TMP = '{} dataset has already been processed and stored at {}'
MODE_EXISTS_TMP = '{} mode of {} dataset has already been processed and stored at {}'

1 change: 1 addition & 0 deletions nemo/collections/nlp/data/datasets/qa_squad_dataset.py
Original file line number Diff line number Diff line change
@@ -42,6 +42,7 @@
from nemo.collections.nlp.utils.common_nlp_utils import _is_whitespace
from nemo.collections.nlp.utils.loss_utils import _compute_softmax

__all__ = ['SquadDataset']

"""
Utility functions for Question Answering NLP tasks
Original file line number Diff line number Diff line change
@@ -13,7 +13,6 @@
# limitations under the License.****

import argparse
import logging
import os

from nemo import logging
3 changes: 2 additions & 1 deletion nemo/collections/nlp/data/tokenizers/bert_tokenizer.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
__all__ = ['NemoBertTokenizer']
import re

from transformers import BertTokenizer

from nemo.collections.nlp.data.tokenizers.tokenizer_spec import TokenizerSpec

__all__ = ['NemoBertTokenizer']


def handle_quotes(text):
text_ = ""
3 changes: 2 additions & 1 deletion nemo/collections/nlp/data/tokenizers/char_tokenizer.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
__all__ = ['CharTokenizer']
from nemo.collections.nlp.data.tokenizers.tokenizer_spec import TokenizerSpec

__all__ = ['CharTokenizer']


class CharTokenizer(TokenizerSpec):
def __init__(self, vocab_path):
5 changes: 2 additions & 3 deletions nemo/collections/nlp/data/tokenizers/fairseq_tokenizer.py
Original file line number Diff line number Diff line change
@@ -2,14 +2,13 @@
https://github.com/NVIDIA/DeepLearningExamples/blob/
master/PyTorch/Translation/Transformer/fairseq/tokenizer.py
"""

__all__ = ['get_unicode_categories', 'tokenize_en']

import re
import sys
import unicodedata
from collections import defaultdict

__all__ = ['get_unicode_categories', 'tokenize_en']


def get_unicode_categories():
cats = defaultdict(list)
3 changes: 2 additions & 1 deletion nemo/collections/nlp/data/tokenizers/gpt2_tokenizer.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
__all__ = ['NemoGPT2Tokenizer']
from transformers import GPT2Tokenizer

from nemo.collections.nlp.data.tokenizers.tokenizer_spec import TokenizerSpec

__all__ = ['NemoGPT2Tokenizer']


class NemoGPT2Tokenizer(TokenizerSpec):
def __init__(
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
__all__ = ['SentencePieceTokenizer']
import sentencepiece as spm

from nemo.collections.nlp.data.tokenizers.tokenizer_spec import TokenizerSpec

__all__ = ['SentencePieceTokenizer']


class SentencePieceTokenizer(TokenizerSpec):
def __init__(self, model_path):
3 changes: 2 additions & 1 deletion nemo/collections/nlp/data/tokenizers/tokenizer_spec.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
__all__ = ['TokenizerSpec']
from abc import ABC, abstractmethod
from typing import List

__all__ = ['TokenizerSpec']


class TokenizerSpec(ABC):
@abstractmethod
3 changes: 2 additions & 1 deletion nemo/collections/nlp/data/tokenizers/word_tokenizer.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
__all__ = ['WordTokenizer']
from nemo.collections.nlp.data.tokenizers.tokenizer_spec import TokenizerSpec

__all__ = ['WordTokenizer']


class WordTokenizer(TokenizerSpec):
def __init__(self, vocab_path):
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
__all__ = ['YouTokenToMeTokenizer']
import youtokentome as yttm

from nemo.collections.nlp.data.tokenizers.tokenizer_spec import TokenizerSpec

__all__ = ['YouTokenToMeTokenizer']


class YouTokenToMeTokenizer(TokenizerSpec):
def __init__(self, model_path):
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
__all__ = ['GlueClassificationDataLayer', 'GlueRegressionDataLayer']
from nemo.collections.nlp.data import GLUEDataset
from nemo.collections.nlp.nm.data_layers.text_datalayer import TextDataLayer
from nemo.core import AxisType, BatchTag, CategoricalTag, NeuralType, RegressionTag, TimeTag

__all__ = ['GlueClassificationDataLayer', 'GlueRegressionDataLayer']


class GlueClassificationDataLayer(TextDataLayer):
"""
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
__all__ = ['BertJointIntentSlotDataLayer', 'BertJointIntentSlotInferDataLayer']
from nemo.collections.nlp.data import BertJointIntentSlotDataset, BertJointIntentSlotInferDataset
from nemo.collections.nlp.nm.data_layers.text_datalayer import TextDataLayer
from nemo.core import AxisType, BatchTag, NeuralType, TimeTag

__all__ = ['BertJointIntentSlotDataLayer', 'BertJointIntentSlotInferDataLayer']


class BertJointIntentSlotDataLayer(TextDataLayer):
"""
3 changes: 2 additions & 1 deletion nemo/collections/nlp/nm/data_layers/lm_bert_datalayer.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
__all__ = ['BertPretrainingDataLayer', 'BertPretrainingPreprocessedDataLayer']
import os
import random

@@ -12,6 +11,8 @@
from nemo.collections.nlp.nm.data_layers.text_datalayer import TextDataLayer
from nemo.core import AxisType, BatchTag, NeuralType, TimeTag

__all__ = ['BertPretrainingDataLayer', 'BertPretrainingPreprocessedDataLayer']


class BertPretrainingDataLayer(TextDataLayer):
"""
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
__all__ = ['LanguageModelingDataLayer']
from nemo.collections.nlp.data import LanguageModelingDataset
from nemo.collections.nlp.nm.data_layers.text_datalayer import TextDataLayer
from nemo.core import AxisType, BatchTag, NeuralType, TimeTag

__all__ = ['LanguageModelingDataLayer']


class LanguageModelingDataLayer(TextDataLayer):
"""
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
__all__ = ['TranslationDataLayer']
import torch
from torch.utils import data as pt_data

@@ -7,6 +6,8 @@
from nemo.collections.nlp.nm.data_layers.text_datalayer import TextDataLayer
from nemo.core import AxisType, BatchTag, NeuralType, TimeTag

__all__ = ['TranslationDataLayer']


class TranslationDataLayer(TextDataLayer):
"""
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
__all__ = ['PunctuationCapitalizationDataLayer']
from nemo.collections.nlp.data import BertPunctuationCapitalizationDataset
from nemo.collections.nlp.nm.data_layers.text_datalayer import TextDataLayer
from nemo.core import AxisType, BatchTag, NeuralType, TimeTag

__all__ = ['PunctuationCapitalizationDataLayer']


class PunctuationCapitalizationDataLayer(TextDataLayer):
@property
3 changes: 2 additions & 1 deletion nemo/collections/nlp/nm/data_layers/qa_squad_datalayer.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
__all__ = ['BertQuestionAnsweringDataLayer']
from nemo.collections.nlp.data import SquadDataset
from nemo.collections.nlp.nm.data_layers.text_datalayer import TextDataLayer
from nemo.core import AxisType, BatchTag, NeuralType, TimeTag

__all__ = ['BertQuestionAnsweringDataLayer']


class BertQuestionAnsweringDataLayer(TextDataLayer):
"""
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
__all__ = ['BertSentenceClassificationDataLayer']
from nemo.collections.nlp.data import BertTextClassificationDataset
from nemo.collections.nlp.nm.data_layers.text_datalayer import TextDataLayer
from nemo.core import AxisType, BatchTag, NeuralType, TimeTag

__all__ = ['BertSentenceClassificationDataLayer']


class BertSentenceClassificationDataLayer(TextDataLayer):
"""
4 changes: 2 additions & 2 deletions nemo/collections/nlp/nm/data_layers/text_datalayer.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
__all__ = ['TextDataLayer']

from nemo.backends.pytorch import DataLayerNM
from nemo.collections.nlp.data.datasets import *

__all__ = ['TextDataLayer']


class TextDataLayer(DataLayerNM):
"""
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
__all__ = ['BertTokenClassificationDataLayer', 'BertTokenClassificationInferDataLayer']
from nemo.collections.nlp.data import BertTokenClassificationDataset, BertTokenClassificationInferDataset
from nemo.collections.nlp.nm.data_layers.text_datalayer import TextDataLayer
from nemo.core import AxisType, BatchTag, NeuralType, TimeTag

__all__ = ['BertTokenClassificationDataLayer', 'BertTokenClassificationInferDataLayer']


class BertTokenClassificationDataLayer(TextDataLayer):
@property
3 changes: 2 additions & 1 deletion nemo/collections/nlp/nm/losses/aggregator_loss.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
__all__ = ['LossAggregatorNM']
from nemo.backends.pytorch import LossNM
from nemo.core import NeuralType

__all__ = ['LossAggregatorNM']


class LossAggregatorNM(LossNM):
"""
Loading