-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to obttain the results written in the paper #6
Comments
Hi! The results in the paper are test set results (as it says in the caption), and several datasets have non-trivial differences between the dev and test data, so it's possible that you've already reproduced our results exactly. In any case, though, I'd urge you to use the newer jiant codebase. It's much better documented, and gets strictly better results than the baselines here. We don't have public dev set numbers from that codebase yet, but if you post an issue there, we should be able to assemble some. https://github.com/jsalt18-sentence-repl/jiant If you do need to use this codebase, reply here and @W4ngatang should be able to share the exact hyperparameters we used. |
Thanks for your enthusiastic reply! I've submitted it to the GLUE platform, but there are still some gaps in CoLA, QNLI and WNLI. | average | CoLA | SST-2 | MRPC | STS-B | QQP | MNLI | QNLI | RTE | WNLI So it will be very nice of you to offer me the hyperparameters which produces the baselines on this codebase. Besides, I'm willing to transfer to jiant, but I'm not sure whether I can produce the GLUE baselines with it. Can I obtain the results on the leaderboard using the Thanks again! |
@W4ngatang - Could you take this one? If you need to exactly match our baselines, jiant won't do that. This paper publishes numbers from the final_glue_runs script, though: https://openreview.net/pdf?id=Bkl87h09FX Sam |
Hey, after fixing lots of issues, I tried running the code. However, I still get the following error: Traceback (most recent call last): I find that the 'MultiTaskTrainer' is not defined in the repository. I sincerely asking for the script for 'MultiTaskTrainer'. My great gratitude! @thxyutong @sleepinyourhat |
@Bogerchen hey bro, have you fix the problem? |
Running into the same MultiTaskTrainer issue. Did someone find a fix? Also @sleepinyourhat concerning jiant I tried using it but found no options for running non-transformers architectures (I want to rerun the LSTM as described in the GloVe paper). Maybe I missed something? Would appreciate you pointing the right way to do it :) |
The reference to jiant above was to v1.3: https://github.com/nyu-mll/jiant-v1-legacy The new v2.0 is mostly a wrapper around Transformers, so it drops LSTM support. Start with v1.3. |
You'll have a much easier time with jiant than with this repo, but if you need an exact reproduction for some reason, ping w4ngatang again. |
Can someone share the MultiTaskTrainer script? I really need this script to reproduce exactly the original GLUE benckmark. Thanks. |
I've tried on your relased code of baselines, but there were some differences between the results tested on the validation set and reported in your paper.
experiment, CoLA(mcc), SST-2, MRPC(acc/f1), QQP(acc/f1), STS-B(pear/spear), MNLI(m/mm), QNLI, RTE, WNLI
your paper, 240, 858, 719/821, 802/591, 688/670, 658/660, 711, 468, 637
my result, 125, 870, 740/829, 794/735, 726/726, 599/605, 584, 574, 141
Both employ the basic BiLSTM model, and follows the MTL setting
You can see there is a huge gap between the results of CoLA(24.0, 12.5) and WNLI(63.7, 14.1)
Here are my hyperparameter settings, could you please help me to check if I'm using the same as yours? This is based on
run_staff.sh
BTW, this is how I test on the validation set. The code is based on
eval_test.py
andmain.py
Thanks!
The text was updated successfully, but these errors were encountered: