Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are dev/test sets used for training? #187

Open
yulonglin opened this issue Mar 30, 2023 · 0 comments
Open

Are dev/test sets used for training? #187

yulonglin opened this issue Mar 30, 2023 · 0 comments

Comments

@yulonglin
Copy link

yulonglin commented Mar 30, 2023

A few datasets are used for training: NUCLE, Lang-8, FCE, WI and LOCNESS. Do you only use the training sets, or also the development and test sets?

Screenshot 2023-03-30 at 20 17 03

Noticeably, you evaluate on the BEA-2019 dev set, which includes WI and LOCNESS, so I would imagine you only train on the training sets of the datasets above?

My source of confusion is from your dataset sizes and how they differ from the follow-up work: https://arxiv.org/pdf/2203.13064.pdf

It seems that you used the full FCE dataset for GECTOR, and only the FCE training set for the ensembling paper.

Screenshot 2023-03-30 at 20 24 34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant