can not reproduce the result of wmt_enfr32k #528

apeterswu · 2018-01-19T06:02:22Z

Hi,

I have tried several times to run transformer_big and transformer_enfr_big setting for wmt_enfr32k problem, but I only got 32.x BLEU score, which is far below the paper 41.x..
I am very confused about the result and can't figure out why. That's so hard to reproduce the wmt_enfr32k result. And I am using t2t v1.2.9.
For evaluation, I just feed in the en-fr.en test file(raw data), and use the output to match the en-fr.fr(raw data) by multi-bleu.perl, is there anything wrong?
Could anyone help give a detailed description about how to reproduce the wmt_enfr32k result, the hparams setting, the batch size and so on? I am struggling to reproduce the enfr result...
@lukaszkaiser Could you please help a little?

Appreciate a lot.

prajdabre · 2018-01-19T06:10:42Z

Hi,

If you are using the latest version of the code then there is a chance that your low scores are due to a BUG.

Even I have faced a problem with the new version which I didn't with the old version.

Look here: #525

xyc1207 · 2018-01-19T06:19:52Z

@lukaszkaiser I also met the same problem as @apeterswu .

BTW, I want to confirm how should we set the hyper parameters (e.g., hparam_set, learning rate, algorithm, batchsize, dropout). What's more, do we need to filter out some training data ? In your paper, you mentioned that you use 36M data but it seems that WMT14 has 40M bilingual data pairs.

Thanks a lot in advance.

apeterswu · 2018-01-19T06:22:49Z

@prajdabre Sorry that I forget the version detail. I am using v1.2.9, which seems reasonable for other datasets training. But wmt_enfr32k is hard to reproduce.

tobyyouup · 2018-01-19T06:59:41Z

@lukaszkaiser @rsepassi I also met this kind of problem as described by @apeterswu. It's hard to reproduce wmt_enfr32k. Previously I successfully reproduce wmt_ende32k, and there are some tips that we need feed the raw(un-token) test data when decoding, and tokenize the data when calculating BLEU. Using these tips I still just get BLEU far below 41.x reported in the paper.

Could you help and have a check?

rsepassi · 2018-02-09T00:38:43Z

I'll try a run too and see how it does.

rsepassi · 2018-02-09T00:40:04Z

Let's move discussion to #518

rsepassi added the question label Feb 9, 2018

rsepassi closed this as completed Feb 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can not reproduce the result of wmt_enfr32k #528

can not reproduce the result of wmt_enfr32k #528

apeterswu commented Jan 19, 2018 •

edited

Loading

prajdabre commented Jan 19, 2018

xyc1207 commented Jan 19, 2018 •

edited

Loading

apeterswu commented Jan 19, 2018 •

edited

Loading

tobyyouup commented Jan 19, 2018

rsepassi commented Feb 9, 2018

rsepassi commented Feb 9, 2018

can not reproduce the result of wmt_enfr32k #528

can not reproduce the result of wmt_enfr32k #528

Comments

apeterswu commented Jan 19, 2018 • edited Loading

prajdabre commented Jan 19, 2018

xyc1207 commented Jan 19, 2018 • edited Loading

apeterswu commented Jan 19, 2018 • edited Loading

tobyyouup commented Jan 19, 2018

rsepassi commented Feb 9, 2018

rsepassi commented Feb 9, 2018

apeterswu commented Jan 19, 2018 •

edited

Loading

xyc1207 commented Jan 19, 2018 •

edited

Loading

apeterswu commented Jan 19, 2018 •

edited

Loading