How to get real bleu score? [approx_bleu_score] #587

ndvbd · 2018-02-14T15:09:06Z

I see that only the approx_bleu_score is sent to the tensorboard. How can evaluate the real bleu?
What's the difference between approx_bleu_score to real bleu?

martinpopel · 2018-02-14T18:58:41Z

How can evaluate the real bleu?

use t2t-bleu

What's the difference between approx_bleu_score to real bleu?

The main difference is that approx_bleu is computed on the internal subwords instead of words, thus it is not replicable (not comparable with other models) and not suitable for reporting in publications.
Another problem is the autoregressive evaluation using gold previous tokens, which is a kind of cheating.
See #407, #522 and #436 for more details.

stefan-it · 2018-04-24T22:04:45Z

@NadavB Are there still questions left? Otherwise I think we could close that issue :)

ndvbd · 2020-10-11T18:43:30Z

@martinpopel thanks,

But how can we use t2t-bleu on the already existing model-data directory (the one that has all the train files, and one dev file) - so it will use the dev file for evaluation?
What script can we run in the command line to get the approx_bleu?

martinpopel mentioned this issue Mar 13, 2018

How can I see approx_bleu on validation set? #650

Open

rsepassi added the question label Mar 20, 2018

ndvbd closed this as completed Apr 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get real bleu score? [approx_bleu_score] #587

How to get real bleu score? [approx_bleu_score] #587

ndvbd commented Feb 14, 2018

martinpopel commented Feb 14, 2018 •

edited

Loading

stefan-it commented Apr 24, 2018

ndvbd commented Oct 11, 2020

How to get real bleu score? [approx_bleu_score] #587

How to get real bleu score? [approx_bleu_score] #587

Comments

ndvbd commented Feb 14, 2018

martinpopel commented Feb 14, 2018 • edited Loading

stefan-it commented Apr 24, 2018

ndvbd commented Oct 11, 2020

martinpopel commented Feb 14, 2018 •

edited

Loading