-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Reproducing the results with cross encoder on DSTC7/Ubuntu V2/Reddit #2974
Comments
I would first compare some of your hyperparams to those listed under the "Cross-encoder" section at this page. What I can tell immediately is that you'll want to specify |
Thanks for replying! I'm using the cross encoder configurations from https://parl.ai/projects/polyencoder/, but the performance was not good (just 60.x%). Here are the training curves My machine has 8 16G GPUs, but I have to set the batchsize as 2 if I do --candidate inline. The training lasts for more than 300k. Here's my training script
|
I have a few suggestions and also a few remarks:
Perhaps try with |
Thank you! Could you also let me know how to get the performances on the test split? |
You should see test results at the end of training, however you can always evaluate your model with |
Thanks! It seems that 60%+ on dev set is a reasonable performance, so I'll temporally close this issue and continue hyper-parameter tuning. Thank you for your kind assistance! |
In the cross-encoder section of the poly-encoder paper, it says "We thus limit its batch size to 16 and provide negatives random samples from the training set. For DSTC7 and Ubuntu V2, we choose 15 such negatives; For ConvAI2, the dataset provides 19 negatives." It seems that the cross-encoder in the paper uses Could you let me know
Thanks! |
This sentence is actually just saying that we use batchsize of 16 regardless of dataset, but we still work with inline candidates. There should not be any performance gap in using batch negatives vs. inline negatives. Using |
Sounds great, thank you! |
Sorry for the frequent question - I have successfully reproduced the experiments on DSTC7 and plan to move on to experiments on Ubuntu V2. However, the format of Ubuntu V2 training data is a .CSV file with heads I wonder if the It would also be very helpful if you could share your Ubuntu V2 training settings, if you can still find them. Many thanks! |
The training settings should be similar if not the same as the ones listed above for dstc7. For our experiments we wrote an augmented teacher that aggregated all the labels in the training set and randomly sampled 15 negatives to put in the |
Thank you! |
Thanks for your help on DSTC7 and Ubuntu! Could let me know if it's possible to train my own cross encoder model on the "Reddit huge" data? If I can do that, what's the best method & setting to do that? |
We don't distribute the reddit data, but you can download from pushshift.io and process it yourself following the instructions in https://arxiv.org/abs/1809.01984, then train as specified in the poly-encoder paper (and also in the linked paper) |
This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening. |
closing for now, please reopen if there are further issues |
Hi, I'm trying to reproduce the performances of cross encoder reported in the paper "Poly-encoders: Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring."
I trained the model on 8 16GB GPUs with the following settings, from #2306 without poly encoder settings
The model was trained for 110k steps but did not converge. The accuracy (hit@1) was 1.5%. Is there any example script for training a cross encoder on DSTC7? @klshuster
The text was updated successfully, but these errors were encountered: