Skip to content

Commit

Permalink
SGD DST (#612)
Browse files Browse the repository at this point in the history
* sgd baseline added

Signed-off-by: Evelina Bakhturina <[email protected]>

* sgd baseline added

Signed-off-by: Evelina Bakhturina <[email protected]>

* intent labels switch to ind, masking moved to model

Signed-off-by: Evelina Bakhturina <[email protected]>

* headers added, clean up, input_ex moved

Signed-off-by: Evelina Bakhturina <[email protected]>

* pin_memory added

Signed-off-by: Evelina Bakhturina <[email protected]>

* pin_memory

Signed-off-by: Evelina Bakhturina <[email protected]>

* multiwoz2.1 support and sgd data augmentation

Signed-off-by: Yang Zhang <[email protected]>

* changed loss to use sum instead of aver, added docstrings and comments, clean up,

Signed-off-by: Evelina Bakhturina <[email protected]>

* data aug renamed + header

Signed-off-by: Evelina Bakhturina <[email protected]>

* loss normalization for the batch_size added

Signed-off-by: Evelina Bakhturina <[email protected]>

* more docstring and loss reduction

Signed-off-by: Evelina Bakhturina <[email protected]>

* lgtm fixes

Signed-off-by: Evelina Bakhturina <[email protected]>

* fix lgtm

Signed-off-by: Yang Zhang <[email protected]>

* jenkins  test  for  sgd

Signed-off-by: Evelina Bakhturina <[email protected]>

* rebase

Signed-off-by: Evelina Bakhturina <[email protected]>

* jenkins path

Signed-off-by: Evelina Bakhturina <[email protected]>

* jenkins path

Signed-off-by: Evelina Bakhturina <[email protected]>

* Fixed loss function to guard non-active losses.

Signed-off-by: Vahid Noroozi <[email protected]>

* Fixed crossenropy loss module to handle empty logits and labels.

Signed-off-by: Vahid Noroozi <[email protected]>

* lgtm fixes

Signed-off-by: Evelina Bakhturina <[email protected]>

* removed textdatalayer dep

Signed-off-by: Evelina Bakhturina <[email protected]>

* fix num2string for augmentation

Signed-off-by: Yang Zhang <[email protected]>

* fix lgtm

Signed-off-by: Yang Zhang <[email protected]>

* adding inflect to nlp requirements

Signed-off-by: Yang Zhang <[email protected]>

* added attention head transform

Signed-off-by: Yang Zhang <[email protected]>

* masks moved to the dataset

Signed-off-by: Evelina Bakhturina <[email protected]>

* merge

Signed-off-by: Evelina Bakhturina <[email protected]>

* headers updated

Signed-off-by: Evelina Bakhturina <[email protected]>

* remove unused import

Signed-off-by: Evelina Bakhturina <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
  • Loading branch information
4 people authored May 22, 2020
1 parent 4951396 commit 09412e2
Show file tree
Hide file tree
Showing 33 changed files with 6,133 additions and 6 deletions.
13 changes: 12 additions & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -193,11 +193,22 @@ pipeline {
}
stage ('Punctuation and Classification Training/Inference Test') {
steps {
sh 'cd examples/nlp/token_classification && CUDA_VISIBLE_DEVICES=1 python punctuation_capitalization.py --data_dir /home/TestData/nlp/token_classification_punctuation/ --work_dir punctuation_output --save_epoch_freq 1 --num_epochs 1 --save_step_freq -1 --batch_size 2'
sh 'cd examples/nlp/token_classification && CUDA_VISIBLE_DEVICES=1 python punctuation_capitalization.py \
--data_dir /home/TestData/nlp/token_classification_punctuation/ --work_dir punctuation_output --save_epoch_freq 1 \
--num_epochs 1 --save_step_freq -1 --batch_size 2'
sh 'cd examples/nlp/token_classification && DATE_F=$(ls punctuation_output/) && DATA_DIR="/home/TestData/nlp/token_classification_punctuation" && CUDA_VISIBLE_DEVICES=1 python punctuation_capitalization_infer.py --checkpoint_dir punctuation_output/$DATE_F/checkpoints/ --punct_labels_dict $DATA_DIR/punct_label_ids.csv --capit_labels_dict $DATA_DIR/capit_label_ids.csv'
sh 'rm -rf examples/nlp/token_classification/punctuation_output'
}
}
stage('SGD Test') {
steps {
sh 'cd examples/nlp/dialogue_state_tracking && CUDA_VISIBLE_DEVICES=0 python dialogue_state_tracking_sgd.py \
--data_dir /home/TestData/nlp/sgd/ --schema_embedding_dir /home/TestData/nlp/sgd/embeddings/ --eval_dataset dev \
--dialogues_example_dir /home/TestData/nlp/sgd/dialogue_example_dir/ --work_dir sgd_output --task DEBUG \
--num_epochs 1 --save_epoch_freq=0'
sh 'rm -rf examples/nlp/dialogue_state_tracking/sgd_output'
}
}
}
}

Expand Down

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions examples/nlp/dialogue_state_tracking/data/multiwoz/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# =============================================================================
# Copyright 2020 NVIDIA. All Rights Reserved.
# Copyright 2019 The Google Research Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =============================================================================
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
alpha-milton alpha milton
any dontcare
bed and breakfast guesthouse
boating boat
cam cambridge
concert concerthall
concert hall concerthall
guest house guesthouse
guesthouses guesthouse
moderate|cheap cheap|moderate
museum kettles yard museum
mutiple sports multiple sports
nightclub night club
acorn guesthouse acorn guest house
swimmingpool swimming pool
sports multiple sports
pool swimming pool
theater theatre
Loading

0 comments on commit 09412e2

Please sign in to comment.