Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nlp refactoring #316

Merged
merged 58 commits into from
Feb 4, 2020
Merged
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
dfdacaa
init commit of nlp refactoring
yzhang123 Jan 25, 2020
889a2eb
fixed import errors
yzhang123 Jan 27, 2020
a14da52
make absolute imports
yzhang123 Jan 27, 2020
ffcd7fb
fix import error
yzhang123 Jan 27, 2020
52d8a70
fix imports
yzhang123 Jan 28, 2020
ff7774b
rebase master
yzhang123 Jan 28, 2020
fdc421b
add all changed nlp files
VahidooX Jan 31, 2020
4f81260
Updated thw whole test folder.
VahidooX Jan 31, 2020
0864e65
Changed nemo.logging to logging
VahidooX Jan 31, 2020
6df0778
Added transformer to the init
VahidooX Jan 31, 2020
44bd381
Fixed lgtm warnings.
VahidooX Jan 31, 2020
b3c1513
Fixed transformer package.
VahidooX Jan 31, 2020
8c58796
Fixed unused local variables.
VahidooX Jan 31, 2020
c5127a1
Fixed lgtm.
VahidooX Jan 31, 2020
ae83cff
Fixed lgtm.
VahidooX Jan 31, 2020
b851473
Fixed logging in examples.
VahidooX Jan 31, 2020
2137704
Merge remote-tracking branch 'remote/master' into nlp_refactoring_tmp
VahidooX Jan 31, 2020
b8f57bf
Moved __all__ after imports. Added more __all__:)
VahidooX Jan 31, 2020
58715fe
Added license to all the files except examples.
VahidooX Feb 1, 2020
a92ea9c
Added license to all examples.
VahidooX Feb 1, 2020
22de233
Fixed style.
VahidooX Feb 1, 2020
8590949
Fixed style.
VahidooX Feb 1, 2020
3bb2035
Updated examples names.
VahidooX Feb 1, 2020
30b7d44
Added licenses to init files.
VahidooX Feb 1, 2020
7a66bec
tested examples
yzhang123 Feb 3, 2020
ba24d5a
fix black style
yzhang123 Feb 3, 2020
8857fc5
updating jenkins after script renaming
yzhang123 Feb 3, 2020
c5a441f
updating changelog
yzhang123 Feb 3, 2020
b2e2475
merged dev-config-nm with nlp_refactor_tmp, all unit tests passed
tkornuta-nvidia Feb 3, 2020
03e02cd
import fixed
ekmb Feb 3, 2020
0d204d6
Moved scripts.
VahidooX Feb 3, 2020
b4349e5
Merge remote-tracking branch 'remote/nlp_refactoring_merged_config' i…
VahidooX Feb 3, 2020
9356e59
Fixed import.
VahidooX Feb 3, 2020
2cf27b8
tested examples scripts
yzhang123 Feb 3, 2020
1afe049
update jenkins
yzhang123 Feb 3, 2020
0363044
Merge branch 'nlp_refactoring_tmp' of https://github.com/NVIDIA/NeMo …
VahidooX Feb 3, 2020
4446f80
merge conflict on CHANGELOG resolved
tkornuta-nvidia Feb 4, 2020
efdb87b
Merge branch 'master' of github.com:NVIDIA/NeMo into merge-nlp-refact…
tkornuta-nvidia Feb 4, 2020
4e81846
LGTM fixes
tkornuta-nvidia Feb 4, 2020
cb11882
removed invalid argument in ipynb
tkornuta-nvidia Feb 4, 2020
4a9343d
removed unused import
tkornuta-nvidia Feb 4, 2020
4a159b4
removed empty nontrainables, as agreed during the meeting, if there a…
tkornuta-nvidia Feb 4, 2020
ecfd2cd
removed nontrainables reference
tkornuta-nvidia Feb 4, 2020
face974
nemo_core - lgtm import fix
tkornuta-nvidia Feb 4, 2020
e59fcdb
removed references to local_params - n-th time:]
tkornuta-nvidia Feb 4, 2020
720f5c5
Cleanups related to removing factory from BERT init calls
tkornuta-nvidia Feb 4, 2020
04a083b
import fixes, nm default shuffleFalse
ekmb Feb 4, 2020
e8e9770
shuffle fixed
ekmb Feb 4, 2020
f555326
fix shuffle
yzhang123 Feb 4, 2020
17dbaf6
Merge branch 'nlp_refactoring_tmp' of https://github.com/NVIDIA/NeMo …
yzhang123 Feb 4, 2020
1852d59
default shFalse in text_dl
ekmb Feb 4, 2020
42b4ecc
Shuffle bug fixed.
VahidooX Feb 4, 2020
46087d1
Merge branch 'nlp_refactoring_tmp' of https://github.com/NVIDIA/NeMo …
VahidooX Feb 4, 2020
5b4a75e
nmt fix
ekmb Feb 4, 2020
21f353b
Merge branch 'nlp_refactoring_tmp' of https://github.com/NVIDIA/NeMo …
ekmb Feb 4, 2020
e7853bf
fix squad unittest
yzhang123 Feb 4, 2020
cc674c9
Merge branch 'nlp_refactoring_tmp' of https://github.com/NVIDIA/NeMo …
yzhang123 Feb 4, 2020
4cbcfd4
Set shuffles to False in some data layers.
VahidooX Feb 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions examples/nlp/asr_postprocessor.py
Original file line number Diff line number Diff line change
@@ -6,8 +6,8 @@

import nemo
import nemo.collections.nlp as nemo_nlp
from nemo.collections.nlp.callbacks.translation import eval_epochs_done_callback_wer, eval_iter_callback
from nemo.collections.nlp.data.tokenizers.bert_tokenizer import NemoBertTokenizer
from nemo.collections.nlp.utils.callbacks.translation import eval_epochs_done_callback_wer, eval_iter_callback
from nemo.core.callbacks import CheckpointCallback
from nemo.utils.lr_policies import SquareAnnealing

@@ -66,7 +66,7 @@
tokens_to_add = vocab_size - tokenizer.vocab_size

zeros_transform = nemo.backends.pytorch.common.ZerosLikeNM()
encoder = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_model, local_rank=args.local_rank)
encoder = nemo_nlp.BERT(pretrained_model_name=args.pretrained_model, local_rank=args.local_rank)
device = encoder.bert.embeddings.word_embeddings.weight.get_device()
zeros = torch.zeros((tokens_to_add, args.d_model)).to(device=device)
encoder.bert.embeddings.word_embeddings.weight.data = torch.cat(
6 changes: 3 additions & 3 deletions examples/nlp/bert_pretraining.py
Original file line number Diff line number Diff line change
@@ -67,9 +67,9 @@

import nemo
import nemo.collections.nlp as nemo_nlp
from nemo.collections.nlp.callbacks.bert_pretraining import eval_epochs_done_callback, eval_iter_callback
from nemo.collections.nlp.data.datasets.utils import BERTPretrainingDataDesc
from nemo.collections.nlp.transformer.utils import gelu
from nemo.collections.nlp.utils.callbacks.bert_pretraining import eval_epochs_done_callback, eval_iter_callback
from nemo.collections.nlp.modules.trainables.specific.transformer.utils import gelu
from nemo.utils.lr_policies import get_lr_policy

parser = argparse.ArgumentParser(description='BERT pretraining')
@@ -174,7 +174,7 @@
args.vocab_size = tokenizer.vocab_size

print(vars(args))
bert_model = nemo_nlp.huggingface.BERT(
bert_model = nemo_nlp.BERT(
vocab_size=args.vocab_size,
num_hidden_layers=args.num_hidden_layers,
hidden_size=args.hidden_size,
10 changes: 5 additions & 5 deletions examples/nlp/glue_with_BERT.py
Original file line number Diff line number Diff line change
@@ -73,8 +73,8 @@
NemoBertTokenizer,
SentencePieceTokenizer,
)
from nemo.collections.nlp.callbacks.glue import eval_epochs_done_callback, eval_iter_callback
from nemo.collections.nlp.data.datasets.utils import output_modes, processors
from nemo.collections.nlp.utils.callbacks.glue import eval_epochs_done_callback, eval_iter_callback
from nemo.utils.lr_policies import get_lr_policy

parser = argparse.ArgumentParser(description="GLUE_with_pretrained_BERT")
@@ -216,10 +216,10 @@
if args.bert_checkpoint is None:
""" Use this if you're using a standard BERT model.
To see the list of pretrained models, call:
nemo_nlp.huggingface.BERT.list_pretrained_models()
nemo_nlp.BERT.list_pretrained_models()
"""
tokenizer = NemoBertTokenizer(args.pretrained_bert_model)
model = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_bert_model)
model = nemo_nlp.BERT(pretrained_model_name=args.pretrained_bert_model)
else:
""" Use this if you're using a BERT model that you pre-trained yourself.
Replace BERT-STEP-150000.pt with the path to your checkpoint.
@@ -234,9 +234,9 @@
if args.bert_config is not None:
with open(args.bert_config) as json_file:
config = json.load(json_file)
model = nemo_nlp.huggingface.BERT(**config)
model = nemo_nlp.BERT(**config)
else:
model = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_bert_model)
model = nemo_nlp.BERT(pretrained_model_name=args.pretrained_bert_model)

model.restore_from(args.bert_checkpoint)

2 changes: 1 addition & 1 deletion examples/nlp/joint_intent_slot_infer.py
Original file line number Diff line number Diff line change
@@ -35,7 +35,7 @@
See the list of pretrained models, call:
nemo_nlp.huggingface.BERT.list_pretrained_models()
"""
pretrained_bert_model = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_bert_model)
pretrained_bert_model = nemo_nlp.BERT(pretrained_model_name=args.pretrained_bert_model)
hidden_size = pretrained_bert_model.local_parameters["hidden_size"]
tokenizer = BertTokenizer.from_pretrained(args.pretrained_bert_model)

4 changes: 2 additions & 2 deletions examples/nlp/joint_intent_slot_infer_b1.py
Original file line number Diff line number Diff line change
@@ -28,9 +28,9 @@

""" Load the pretrained BERT parameters
See the list of pretrained models, call:
nemo_nlp.huggingface.BERT.list_pretrained_models()
nemo_nlp.BERT.list_pretrained_models()
"""
pretrained_bert_model = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_bert_model, factory=nf)
pretrained_bert_model = nemo_nlp.BERT(pretrained_model_name=args.pretrained_bert_model, factory=nf)
tokenizer = BertTokenizer.from_pretrained(args.pretrained_bert_model)
hidden_size = pretrained_bert_model.local_parameters["hidden_size"]

6 changes: 3 additions & 3 deletions examples/nlp/joint_intent_slot_with_bert.py
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@

import nemo
import nemo.collections.nlp as nemo_nlp
from nemo.collections.nlp.callbacks.joint_intent_slot import eval_epochs_done_callback, eval_iter_callback
from nemo.collections.nlp.data.datasets.utils import JointIntentSlotDataDesc
from nemo.collections.nlp.utils.callbacks.joint_intent_slot import eval_epochs_done_callback, eval_iter_callback
from nemo.utils.lr_policies import get_lr_policy

# Parsing arguments
@@ -71,10 +71,10 @@
nemo_nlp.huggingface.BERT.list_pretrained_models()
"""
if args.bert_checkpoint and args.bert_config:
pretrained_bert_model = nemo_nlp.huggingface.BERT(config_filename=args.bert_config, factory=nf)
pretrained_bert_model = nemo_nlp.BERT(config_filename=args.bert_config, factory=nf)
pretrained_bert_model.restore_from(args.bert_checkpoint)
else:
pretrained_bert_model = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_bert_model, factory=nf)
pretrained_bert_model = nemo_nlp.BERT(pretrained_model_name=args.pretrained_bert_model, factory=nf)

hidden_size = pretrained_bert_model.local_parameters["hidden_size"]

2 changes: 1 addition & 1 deletion examples/nlp/nmt_tutorial.py
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@

import nemo
import nemo.collections.nlp as nemo_nlp
from nemo.collections.nlp.utils.callbacks.translation import eval_epochs_done_callback, eval_iter_callback
from nemo.collections.nlp.callbacks.translation import eval_epochs_done_callback, eval_iter_callback
from nemo.utils.lr_policies import get_lr_policy

parser = nemo.utils.NemoArgParser(description='Transformer for Neural Machine Translation')
11 changes: 4 additions & 7 deletions examples/nlp/punctuation_capitalization.py
Original file line number Diff line number Diff line change
@@ -8,11 +8,8 @@
import nemo
import nemo.collections.nlp as nemo_nlp
from nemo.collections.nlp import NemoBertTokenizer, SentencePieceTokenizer, TokenClassificationLoss, TokenClassifier
from nemo.collections.nlp.callbacks.punctuation_capitalization import eval_epochs_done_callback, eval_iter_callback
from nemo.collections.nlp.data.datasets import utils
from nemo.collections.nlp.utils.callbacks.punctuation_capitalization import (
eval_epochs_done_callback,
eval_iter_callback,
)
from nemo.utils.lr_policies import get_lr_policy

# Parsing arguments
@@ -119,7 +116,7 @@
nemo_nlp.huggingface.BERT.list_pretrained_models()
"""
tokenizer = NemoBertTokenizer(args.pretrained_bert_model)
model = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_bert_model)
model = nemo_nlp.BERT(pretrained_model_name=args.pretrained_bert_model)
else:
""" Use this if you're using a BERT model that you pre-trained yourself.
"""
@@ -133,9 +130,9 @@
if args.bert_config is not None:
with open(args.bert_config) as json_file:
config = json.load(json_file)
model = nemo_nlp.huggingface.BERT(**config)
model = nemo_nlp.BERT(**config)
else:
model = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_bert_model)
model = nemo_nlp.BERT(pretrained_model_name=args.pretrained_bert_model)

model.restore_from(args.bert_checkpoint)
nemo.logging.info(f"Model restored from {args.bert_checkpoint}")
4 changes: 2 additions & 2 deletions examples/nlp/punctuation_capitalization_infer.py
Original file line number Diff line number Diff line change
@@ -75,9 +75,9 @@

""" Load the pretrained BERT parameters
See the list of pretrained models, call:
nemo_nlp.huggingface.BERT.list_pretrained_models()
nemo_nlp.BERT.list_pretrained_models()
"""
pretrained_bert_model = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_bert_model)
pretrained_bert_model = nemo_nlp.BERT(pretrained_model_name=args.pretrained_bert_model)
hidden_size = pretrained_bert_model.local_parameters["hidden_size"]
tokenizer = NemoBertTokenizer(args.pretrained_bert_model)

8 changes: 4 additions & 4 deletions examples/nlp/token_classification.py
Original file line number Diff line number Diff line change
@@ -8,8 +8,8 @@
import nemo
import nemo.collections.nlp as nemo_nlp
from nemo.collections.nlp import NemoBertTokenizer, SentencePieceTokenizer, TokenClassificationLoss, TokenClassifier
from nemo.collections.nlp.callbacks.token_classification import eval_epochs_done_callback, eval_iter_callback
from nemo.collections.nlp.data.datasets import utils
from nemo.collections.nlp.utils.callbacks.token_classification import eval_epochs_done_callback, eval_iter_callback
from nemo.utils.lr_policies import get_lr_policy

# Parsing arguments
@@ -116,7 +116,7 @@
nemo_nlp.huggingface.BERT.list_pretrained_models()
"""
tokenizer = NemoBertTokenizer(args.pretrained_bert_model)
model = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_bert_model)
model = nemo_nlp.BERT(pretrained_model_name=args.pretrained_bert_model)
else:
""" Use this if you're using a BERT model that you pre-trained yourself.
"""
@@ -130,9 +130,9 @@
if args.bert_config is not None:
with open(args.bert_config) as json_file:
config = json.load(json_file)
model = nemo_nlp.huggingface.BERT(**config)
model = nemo_nlp.BERT(**config)
else:
model = nemo_nlp.huggingface.BERT(pretrained_model_name=args.pretrained_bert_model)
model = nemo_nlp.BERT(pretrained_model_name=args.pretrained_bert_model)

model.restore_from(args.bert_checkpoint)
nemo.logging.info(f"Model restored from {args.bert_checkpoint}")
2 changes: 1 addition & 1 deletion examples/nlp/transformer_lm.py
Original file line number Diff line number Diff line change
@@ -3,8 +3,8 @@

import nemo
import nemo.collections.nlp as nemo_nlp
from nemo.collections.nlp.callbacks.language_modeling import eval_epochs_done_callback, eval_iter_callback
from nemo.collections.nlp.data.datasets.utils import LanguageModelDataDesc
from nemo.collections.nlp.utils.callbacks.language_modeling import eval_epochs_done_callback, eval_iter_callback
from nemo.utils.lr_policies import CosineAnnealing

parser = nemo.utils.NemoArgParser(description='LM Transformer')
3 changes: 3 additions & 0 deletions nemo/collections/nlp/data/datasets/joint_intent_slot.py
Original file line number Diff line number Diff line change
@@ -25,6 +25,9 @@
import numpy as np
from torch.utils.data import Dataset

import nemo
import nemo.collections.nlp.data.datasets.utils as utils


def get_features(
queries,
2 changes: 1 addition & 1 deletion nemo/collections/nlp/data/datasets/language_modeling.py
Original file line number Diff line number Diff line change
@@ -17,7 +17,7 @@
import numpy as np
from torch.utils.data import Dataset

import nemo.collections.nlp.utils as utils
import nemo.collections.nlp.data.utils as utils


class LanguageModelingDataset(Dataset):
Original file line number Diff line number Diff line change
@@ -28,6 +28,7 @@
from torch.utils.data import Dataset

import nemo
import nemo.collections.nlp.data.datasets.utils


def get_features(
Original file line number Diff line number Diff line change
@@ -12,6 +12,13 @@
import math

from nemo.backends.pytorch.nm import LossNM, TrainableNM
from nemo.collections.nlp.modules.trainables.specific.transformer.decoders import TransformerDecoder
from nemo.collections.nlp.modules.trainables.specific.transformer.encoders import TransformerEncoder
from nemo.collections.nlp.modules.trainables.specific.transformer.generators import (
BeamSearchSequenceGenerator,
GreedySequenceGenerator,
)
from nemo.collections.nlp.modules.trainables.specific.transformer.modules import TransformerEmbedding
from nemo.collections.nlp.modules.trainables.specific.transformer.utils import transformer_weights_init
from nemo.core.neural_types import *

4 changes: 2 additions & 2 deletions tests/nlp/test_squad.py
Original file line number Diff line number Diff line change
@@ -92,7 +92,7 @@ def test_squad_v1(self):
neural_factory = nemo.core.NeuralModuleFactory(
backend=nemo.core.Backend.PyTorch, local_rank=None, create_tb_writer=False,
)
model = nemo_nlp.huggingface.BERT(pretrained_model_name=pretrained_bert_model)
model = nemo_nlp.BERT(pretrained_model_name=pretrained_bert_model)
hidden_size = model.local_parameters["hidden_size"]
qa_head = nemo_nlp.TokenClassifier(hidden_size=hidden_size, num_classes=2, num_layers=1, log_softmax=False,)
squad_loss = nemo_nlp.QuestionAnsweringLoss()
@@ -199,7 +199,7 @@ def test_squad_v2(self):
neural_factory = nemo.core.NeuralModuleFactory(
backend=nemo.core.Backend.PyTorch, local_rank=None, create_tb_writer=False,
)
model = nemo_nlp.huggingface.BERT(pretrained_model_name=pretrained_bert_model)
model = nemo_nlp.BERT(pretrained_model_name=pretrained_bert_model)
hidden_size = model.local_parameters["hidden_size"]
qa_head = nemo_nlp.TokenClassifier(hidden_size=hidden_size, num_classes=2, num_layers=1, log_softmax=False,)
squad_loss = nemo_nlp.QuestionAnsweringLoss()