You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.
I see that only the approx_bleu_score is sent to the tensorboard. How can evaluate the real bleu?
What's the difference between approx_bleu_score to real bleu?
The text was updated successfully, but these errors were encountered:
What's the difference between approx_bleu_score to real bleu?
The main difference is that approx_bleu is computed on the internal subwords instead of words, thus it is not replicable (not comparable with other models) and not suitable for reporting in publications.
Another problem is the autoregressive evaluation using gold previous tokens, which is a kind of cheating.
See #407, #522 and #436 for more details.
But how can we use t2t-bleu on the already existing model-data directory (the one that has all the train files, and one dev file) - so it will use the dev file for evaluation?
What script can we run in the command line to get the approx_bleu?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I see that only the approx_bleu_score is sent to the tensorboard. How can evaluate the real bleu?
What's the difference between approx_bleu_score to real bleu?
The text was updated successfully, but these errors were encountered: