-
Notifications
You must be signed in to change notification settings - Fork 2.1k
How to reproduce the result on the poly-encoder model for dstc7 #2306
Comments
Hi, For the poly-encoder, please try with the following hyperparameters for dstc7:
|
Thanks a lot! What about the bi-encoder and cross encoder? Are their hyper-parameters the same as this one except the "--model, --init model" and without "--polyencoder-type, --codes-attention-type, --poly-n-codes, --poly-attention-type, --polyencoder-attention-keys"? |
I tried with this hyper-parameter setting (added with "-pyt dstc7 ----eval-batchsize 10 --mode-file ") The result is still not good: While in paper, the results are Emmm, are there anything else i can do to rescue it? |
Would you mind providing your full train log, perhaps in a github gist? |
Here is the link to two training logs: https://github.com/JiaQiSJTU/poly-encoder |
So we actually used an augmented training set for our models; once #2314 is merged, you can specify |
Thanks a lot! I still have some problems:
|
A better way of phrasing this is that the data was not "augmented" but rather presented to an agent on an episodic basis. I.e., similar to how the ConvAI2 dataset is presented, i.e. utterance by utterance, we "augmented" the DSTC7 data to include intermediate utterance predictions. For example, suppose we have a dialogue between two speakers with utterances [A, B, C, D], and we are attempting to model speaker 2. In the original dataset, the conversation would be presented to a model as:
In the augmented version, we present the data as:
The results in the paper use this representation. All of our reported results use this dataset, so it is still clear that the pre-training helps. |
Get it! |
I'll close for now, please re-open if you find you have further issues |
Hi. @klshuster Would you mind me asking a question?
|
|
@klshuster Thanks for your kind response! |
We offer the google-BERT model as an init model for the bi-encoder and cross-encoder - see https://github.com/facebookresearch/ParlAI/tree/master/parlai/agents/bert_ranker. We have no plans to release a poly-encoder-based agent in this paradigm. However, if you are using the |
In the readme.md about "Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring", it just shows how to reproduce the result on ConvAI2. I changed the "-t convai2" into "-t dstc7" and finetuned on this task. The results are around 5%-6% lower than the results showed in this paper. Are there any other hyper-parameters I need to change to reproduce the results on DSTC7 ?
The text was updated successfully, but these errors were encountered: