Flag to output the score of each sentence at inference + various scoring tools #2196
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The goal is to be able to rerank nbest hypothesis when combining the score of each translation with another score.
In the context of Noisy Channel Reranking we combine:
the score of each translation
the backward score with an inverse model
a language model score of each translation
New tools:
mbr_bleu: compute the best bleu amongst n_best hypotheses between themselves.
oracle_bleu: compute the best possible BLEU against a reference and pick the best in n_best
oracle_comet: same with unbable/comet