Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

can not reproduce the result of wmt_enfr32k #528

Closed
apeterswu opened this issue Jan 19, 2018 · 6 comments
Closed

can not reproduce the result of wmt_enfr32k #528

apeterswu opened this issue Jan 19, 2018 · 6 comments
Labels

Comments

@apeterswu
Copy link

apeterswu commented Jan 19, 2018

Hi,

I have tried several times to run transformer_big and transformer_enfr_big setting for wmt_enfr32k problem, but I only got 32.x BLEU score, which is far below the paper 41.x..
I am very confused about the result and can't figure out why. That's so hard to reproduce the wmt_enfr32k result. And I am using t2t v1.2.9.
For evaluation, I just feed in the en-fr.en test file(raw data), and use the output to match the en-fr.fr(raw data) by multi-bleu.perl, is there anything wrong?
Could anyone help give a detailed description about how to reproduce the wmt_enfr32k result, the hparams setting, the batch size and so on? I am struggling to reproduce the enfr result...
@lukaszkaiser Could you please help a little?

Appreciate a lot.

@prajdabre
Copy link

Hi,

If you are using the latest version of the code then there is a chance that your low scores are due to a BUG.

Even I have faced a problem with the new version which I didn't with the old version.

Look here: #525

@xyc1207
Copy link

xyc1207 commented Jan 19, 2018

@lukaszkaiser I also met the same problem as @apeterswu .

BTW, I want to confirm how should we set the hyper parameters (e.g., hparam_set, learning rate, algorithm, batchsize, dropout). What's more, do we need to filter out some training data ? In your paper, you mentioned that you use 36M data but it seems that WMT14 has 40M bilingual data pairs.

Thanks a lot in advance.

@apeterswu
Copy link
Author

apeterswu commented Jan 19, 2018

@prajdabre Sorry that I forget the version detail. I am using v1.2.9, which seems reasonable for other datasets training. But wmt_enfr32k is hard to reproduce.

@tobyyouup
Copy link

@lukaszkaiser @rsepassi I also met this kind of problem as described by @apeterswu. It's hard to reproduce wmt_enfr32k. Previously I successfully reproduce wmt_ende32k, and there are some tips that we need feed the raw(un-token) test data when decoding, and tokenize the data when calculating BLEU. Using these tips I still just get BLEU far below 41.x reported in the paper.

Could you help and have a check?

@rsepassi
Copy link
Contributor

rsepassi commented Feb 9, 2018

I'll try a run too and see how it does.

@rsepassi
Copy link
Contributor

rsepassi commented Feb 9, 2018

Let's move discussion to #518

@rsepassi rsepassi closed this as completed Feb 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants