Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpt2-medium LAMA #17

Open
casually-PYlearner opened this issue May 11, 2021 · 2 comments
Open

gpt2-medium LAMA #17

casually-PYlearner opened this issue May 11, 2021 · 2 comments

Comments

@casually-PYlearner
Copy link

casually-PYlearner commented May 11, 2021

Hi, i have just used the default params to p-tune the gpt2-medium on LAMA task and the results is as follows.
best dev_hit@1: 51.8 best test_hit@1: 44.5
For the results I got, I have some confusions...
(1) It seems that there is a gap between the dev results and the test results. Are the dev set and the test set in the same distribution? Is it possible to provide the scipts of generating the train/dev/test sets and the original dataset?
(2) The results reported in the paper is 46.5, which is close to the best test_hit@1. Are the results in the paper based on the test set?
It will be very nice if the shell scipts is provided to reproduce the results in the paper.

@zhaochen0110
Copy link

hi, I also use the params to p-tune the LAMA task, meeting the same questions when using bert-base-uncased.
My best dev_hit@1: 75.1 best test_hit@1: 85.2
However, the results reported in the paper is 52.3. Does you meet the same question? Has your question been solved?

@lancorrect
Copy link

hi, I also use the params to p-tune the LAMA task, meeting the same questions when using bert-base-uncased. My best dev_hit@1: 75.1 best test_hit@1: 85.2 However, the results reported in the paper is 52.3. Does you meet the same question? Has your question been solved?

Hi,

The problem you mentioned may be caused by runing the codes in single subdataset like P1001. I wonder author ran his codes in whole dataset and averaged all results. Maybe you can have a try and verify if I'm right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants