gpt2-medium LAMA #17

casually-PYlearner · 2021-05-11T03:35:53Z

Hi, i have just used the default params to p-tune the gpt2-medium on LAMA task and the results is as follows.
best dev_hit@1: 51.8 best test_hit@1: 44.5
For the results I got, I have some confusions...
(1) It seems that there is a gap between the dev results and the test results. Are the dev set and the test set in the same distribution? Is it possible to provide the scipts of generating the train/dev/test sets and the original dataset?
(2) The results reported in the paper is 46.5, which is close to the best test_hit@1. Are the results in the paper based on the test set?
It will be very nice if the shell scipts is provided to reproduce the results in the paper.

zhaochen0110 · 2021-08-16T07:46:31Z

hi, I also use the params to p-tune the LAMA task, meeting the same questions when using bert-base-uncased.
My best dev_hit@1: 75.1 best test_hit@1: 85.2
However, the results reported in the paper is 52.3. Does you meet the same question? Has your question been solved?

lancorrect · 2023-02-22T07:42:12Z

hi, I also use the params to p-tune the LAMA task, meeting the same questions when using bert-base-uncased. My best dev_hit@1: 75.1 best test_hit@1: 85.2 However, the results reported in the paper is 52.3. Does you meet the same question? Has your question been solved?

Hi,

The problem you mentioned may be caused by runing the codes in single subdataset like P1001. I wonder author ran his codes in whole dataset and averaged all results. Maybe you can have a try and verify if I'm right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt2-medium LAMA #17

gpt2-medium LAMA #17

casually-PYlearner commented May 11, 2021 •

edited

Loading

zhaochen0110 commented Aug 16, 2021

lancorrect commented Feb 22, 2023

gpt2-medium LAMA #17

gpt2-medium LAMA #17

Comments

casually-PYlearner commented May 11, 2021 • edited Loading

zhaochen0110 commented Aug 16, 2021

lancorrect commented Feb 22, 2023

casually-PYlearner commented May 11, 2021 •

edited

Loading