Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

How to get real bleu score? [approx_bleu_score] #587

Closed
ndvbd opened this issue Feb 14, 2018 · 3 comments
Closed

How to get real bleu score? [approx_bleu_score] #587

ndvbd opened this issue Feb 14, 2018 · 3 comments
Labels

Comments

@ndvbd
Copy link
Contributor

ndvbd commented Feb 14, 2018

I see that only the approx_bleu_score is sent to the tensorboard. How can evaluate the real bleu?
What's the difference between approx_bleu_score to real bleu?

@martinpopel
Copy link
Contributor

martinpopel commented Feb 14, 2018

How can evaluate the real bleu?

use t2t-bleu

What's the difference between approx_bleu_score to real bleu?

The main difference is that approx_bleu is computed on the internal subwords instead of words, thus it is not replicable (not comparable with other models) and not suitable for reporting in publications.
Another problem is the autoregressive evaluation using gold previous tokens, which is a kind of cheating.
See #407, #522 and #436 for more details.

@stefan-it
Copy link
Contributor

@NadavB Are there still questions left? Otherwise I think we could close that issue :)

@ndvbd ndvbd closed this as completed Apr 25, 2018
@ndvbd
Copy link
Contributor Author

ndvbd commented Oct 11, 2020

@martinpopel thanks,

  1. But how can we use t2t-bleu on the already existing model-data directory (the one that has all the train files, and one dev file) - so it will use the dev file for evaluation?

  2. What script can we run in the command line to get the approx_bleu?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants