Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual IWSLT tst2017 test set is broken #15

Open
stephanpeitz opened this issue Jan 24, 2020 · 3 comments
Open

Multilingual IWSLT tst2017 test set is broken #15

stephanpeitz opened this issue Jan 24, 2020 · 3 comments

Comments

@stephanpeitz
Copy link

Hi,

just realised that the preprocessed data provided under https://github.com/quanpn90/NMTGMinor/tree/master/recipes/multilingual-translation are not correct.

In particular, the test sets tst2017 contain ~3k-4k lines while it should be ~1.1k.
Furthermore, the references are not correct, e.g. the first ~2k lines of tst2017.en-de.bpe.de are in English rather than German.

My guess is that you accidentally mixed in tst2010.

You might want to fix that otherwise people could assume you computed your BLEU scores based on these incorrect test sets.

Cheers,
Stephan

@quanpn90
Copy link
Owner

quanpn90 commented Jan 24, 2020 via email

@stephanpeitz
Copy link
Author

Hi Quan,

have you been able to fix it?

Cheers,
Stephan

@quanpn90
Copy link
Owner

quanpn90 commented Feb 5, 2020

Hi Stephan, thank you for reminding me.

I think I did a terrible mistake for these test sets due to a mistake in preprocessing. Basically the test set 2017 were duplicated (twice) so the BLEU score in the paper is possibly not correct.

If you don't mind I will run the translation again because the models are still here and put the correct results here.

Sorry for this mistake.
Quan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants