How to reproduce the result on the poly-encoder model for dstc7 #2306

JiaQiSJTU · 2019-12-26T11:00:45Z

In the readme.md about "Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring", it just shows how to reproduce the result on ConvAI2. I changed the "-t convai2" into "-t dstc7" and finetuned on this task. The results are around 5%-6% lower than the results showed in this paper. Are there any other hyper-parameters I need to change to reproduce the results on DSTC7 ?

klshuster · 2019-12-26T22:57:35Z

Hi,

For the poly-encoder, please try with the following hyperparameters for dstc7:

--init-model zoo:pretrained_transformers/poly_model_huge_reddit/model 
--batchsize 256  
--model transformer/polyencoder 
--warmup_updates 100 
--lr-scheduler-patience 0 
--lr-scheduler-decay 0.4 
-lr 5e-05 
--data-parallel True 
--history-size 20 
--label-truncate 72 
--text-truncate 360 
-vp 5 
-veps 0.5 
--validation-metric accuracy 
--validation-metric-mode max 
--save-after-valid True 
--log_every_n_secs 20 
--candidates batch 
--dict-tokenizer bpe  
--dict-lower True 
--optimizer adamax 
--output-scaling 0.06 
--variant xlm 
--reduction_type mean 
--share-encoders False 
--learn-positional-embeddings True 
--n-layers 12 
--n-heads 12 
--ffn-size 3072 
--attention-dropout 0.1 
--relu-dropout 0.0 
--dropout 0.1 
--n-positions 1024 
--embedding-size 768 
--activation gelu 
--embeddings-scale False 
--n-segments 2 
--learn-embeddings True 
--share-word-embeddings False 
--dict-endtoken __start__ 
--fp16 True 
--polyencoder-type codes 
--codes-attention-type basic 
--poly-n-codes 64 
--poly-attention-type basic 
--polyencoder-attention-keys context

JiaQiSJTU · 2019-12-27T04:04:51Z

Hi,

For the poly-encoder, please try with the following hyperparameters for dstc7:

--init-model zoo:pretrained_transformers/poly_model_huge_reddit/model 
--batchsize 256  
--model transformer/polyencoder 
--warmup_updates 100 
--lr-scheduler-patience 0 
--lr-scheduler-decay 0.4 
-lr 5e-05 
--data-parallel True 
--history-size 20 
--label-truncate 72 
--text-truncate 360 
-vp 5 
-veps 0.5 
--validation-metric accuracy 
--validation-metric-mode max 
--save-after-valid True 
--log_every_n_secs 20 
--candidates batch 
--dict-tokenizer bpe  
--dict-lower True 
--optimizer adamax 
--output-scaling 0.06 
--variant xlm 
--reduction_type mean 
--share-encoders False 
--learn-positional-embeddings True 
--n-layers 12 
--n-heads 12 
--ffn-size 3072 
--attention-dropout 0.1 
--relu-dropout 0.0 
--dropout 0.1 
--n-positions 1024 
--embedding-size 768 
--activation gelu 
--embeddings-scale False 
--n-segments 2 
--learn-embeddings True 
--share-word-embeddings False 
--dict-endtoken __start__ 
--fp16 True 
--polyencoder-type codes 
--codes-attention-type basic 
--poly-n-codes 64 
--poly-attention-type basic 
--polyencoder-attention-keys context

Thanks a lot! What about the bi-encoder and cross encoder? Are their hyper-parameters the same as this one except the "--model, --init model" and without "--polyencoder-type, --codes-attention-type, --poly-n-codes, --poly-attention-type, --polyencoder-attention-keys"?

JiaQiSJTU · 2019-12-27T08:51:42Z

Hi,

For the poly-encoder, please try with the following hyperparameters for dstc7:

--init-model zoo:pretrained_transformers/poly_model_huge_reddit/model 
--batchsize 256  
--model transformer/polyencoder 
--warmup_updates 100 
--lr-scheduler-patience 0 
--lr-scheduler-decay 0.4 
-lr 5e-05 
--data-parallel True 
--history-size 20 
--label-truncate 72 
--text-truncate 360 
-vp 5 
-veps 0.5 
--validation-metric accuracy 
--validation-metric-mode max 
--save-after-valid True 
--log_every_n_secs 20 
--candidates batch 
--dict-tokenizer bpe  
--dict-lower True 
--optimizer adamax 
--output-scaling 0.06 
--variant xlm 
--reduction_type mean 
--share-encoders False 
--learn-positional-embeddings True 
--n-layers 12 
--n-heads 12 
--ffn-size 3072 
--attention-dropout 0.1 
--relu-dropout 0.0 
--dropout 0.1 
--n-positions 1024 
--embedding-size 768 
--activation gelu 
--embeddings-scale False 
--n-segments 2 
--learn-embeddings True 
--share-word-embeddings False 
--dict-endtoken __start__ 
--fp16 True 
--polyencoder-type codes 
--codes-attention-type basic 
--poly-n-codes 64 
--poly-attention-type basic 
--polyencoder-attention-keys context

I tried with this hyper-parameter setting (added with "-pyt dstc7 ----eval-batchsize 10 --mode-file ")

The result is still not good:
'hits@1': 0.641, 'hits@10': 0.889, 'mrr': 0.727.

While in paper, the results are
hits@1 = 70.9--0.6 ; hits@10 = 91.5--0.5; mrr = 78.0--0.3

Emmm, are there anything else i can do to rescue it?

klshuster · 2019-12-27T19:42:09Z

Would you mind providing your full train log, perhaps in a github gist?

JiaQiSJTU · 2019-12-28T04:15:20Z

Would you mind providing your full train log, perhaps in a github gist?

Here is the link to two training logs: https://github.com/JiaQiSJTU/poly-encoder

klshuster · 2020-01-03T15:19:23Z

So we actually used an augmented training set for our models; once #2314 is merged, you can specify -t dstc7:DSTC7TeacherAugmentedSampled to train on this augmented set.

JiaQiSJTU · 2020-01-04T01:25:36Z

So we actually used an augmented training set for our models; once #2314 is merged, you can specify -t dstc7:DSTC7TeacherAugmentedSampled to train on this augmented set.

Thanks a lot! I still have some problems:

How did u do the data augmentation?
Did the results in the poly paper use the augmented data or just the original data?
It seems that "hits@1" and "mrr" are greatly improved by the data augmentation. I'm doubt that whether the pre-trained model works or the data augmentation works.

klshuster · 2020-01-06T19:44:07Z

A better way of phrasing this is that the data was not "augmented" but rather presented to an agent on an episodic basis. I.e., similar to how the ConvAI2 dataset is presented, i.e. utterance by utterance, we "augmented" the DSTC7 data to include intermediate utterance predictions.

For example, suppose we have a dialogue between two speakers with utterances [A, B, C, D], and we are attempting to model speaker 2. In the original dataset, the conversation would be presented to a model as:

text: A,B,C
label: D

In the augmented version, we present the data as:

episode 1:
    text: A
    label: B

episode 2:
    text: A, B, C
    label: D

The results in the paper use this representation. All of our reported results use this dataset, so it is still clear that the pre-training helps.

JiaQiSJTU · 2020-01-10T10:31:51Z

Get it!

klshuster · 2020-01-10T15:40:45Z

I'll close for now, please re-open if you find you have further issues

sjkoo1989 · 2020-03-15T08:06:59Z

Hi. @klshuster Would you mind me asking a question?

We found that the interactive.py needs a user to input persona information sentences (for CONVAI2). Are they necessary in reproducing the training process? If so, which token did you use for separating each persona question? Are persona information sentences considered as context input?
In the previous replies, you mentioned that contexts are augmented as
episode N:
text: A1, B1, A2, B2,...
label: BN
However, in reproducing your result, we found that finding an adequate turn-separation scheme is important when we use base bert weights from google-bert(& hugging-face). How did you manage to separate sentences in given context?

klshuster · 2020-03-17T14:24:31Z

for training on dstc7, persona information sentences are not necessary; for training on convai2, persona information sentences are presented in the input to the model, so there is no additional work to be done there.
We simply used a new-line delimiter to separate sentences in the given context.

sjkoo1989 · 2020-03-20T07:24:47Z

@klshuster Thanks for your kind response!
Would you mind me asking another question?
How can we set google-BERT for the initial model?
(It seems that not giving --init-model would work, but we are not sure about it.)

klshuster · 2020-03-20T13:28:33Z

We offer the google-BERT model as an init model for the bi-encoder and cross-encoder - see https://github.com/facebookresearch/ParlAI/tree/master/parlai/agents/bert_ranker. We have no plans to release a poly-encoder-based agent in this paradigm.

However, if you are using the transformer/polyencoder and specify --init-model zoo:pretrained_transformers/poly_model_huge_wikito/model, you will initialize the model with weights that were pre-trained on the same data as google-BERT (i.e. wikipedia and toronto books) and which obtain similar results as google-BERT on the tasks we considered in the paper.

klshuster self-assigned this Dec 26, 2019

klshuster mentioned this issue Jan 3, 2020

[Poly-encoder] Fixes for DSTC7 Task #2314

Merged

klshuster closed this as completed Jan 10, 2020

luohongyin mentioned this issue Aug 12, 2020

Reproducing the results with cross encoder on DSTC7/Ubuntu V2/Reddit #2974

Closed

chijames mentioned this issue Aug 18, 2020

result for dstc 7 sfzhou5678/PolyEncoder#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reproduce the result on the poly-encoder model for dstc7 #2306

How to reproduce the result on the poly-encoder model for dstc7 #2306

JiaQiSJTU commented Dec 26, 2019

klshuster commented Dec 26, 2019

JiaQiSJTU commented Dec 27, 2019

JiaQiSJTU commented Dec 27, 2019

klshuster commented Dec 27, 2019

JiaQiSJTU commented Dec 28, 2019

klshuster commented Jan 3, 2020

JiaQiSJTU commented Jan 4, 2020

klshuster commented Jan 6, 2020

JiaQiSJTU commented Jan 10, 2020

klshuster commented Jan 10, 2020

sjkoo1989 commented Mar 15, 2020 •

edited

Loading

klshuster commented Mar 17, 2020

sjkoo1989 commented Mar 20, 2020

klshuster commented Mar 20, 2020

How to reproduce the result on the poly-encoder model for dstc7 #2306

How to reproduce the result on the poly-encoder model for dstc7 #2306

Comments

JiaQiSJTU commented Dec 26, 2019

klshuster commented Dec 26, 2019

JiaQiSJTU commented Dec 27, 2019

JiaQiSJTU commented Dec 27, 2019

klshuster commented Dec 27, 2019

JiaQiSJTU commented Dec 28, 2019

klshuster commented Jan 3, 2020

JiaQiSJTU commented Jan 4, 2020

klshuster commented Jan 6, 2020

JiaQiSJTU commented Jan 10, 2020

klshuster commented Jan 10, 2020

sjkoo1989 commented Mar 15, 2020 • edited Loading

klshuster commented Mar 17, 2020

sjkoo1989 commented Mar 20, 2020

klshuster commented Mar 20, 2020

sjkoo1989 commented Mar 15, 2020 •

edited

Loading