Convert QACandidates with empty or whitespace answers to no_answers on doc level #756
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There are no_answer QACandidates with non-zero start/end indices that cause errors as reported in #729 .
Here are two examples. The first examples contains multiple consecutive whitespaces "mass of _ _ will" because latex commands were removed from the original text. The second example contains a whitespace at the very end "geographer. _"
This PR fixes the issue by converting all predicted answers to no_answers that consist of an empty string or contain no other symbols than whitespaces (including tabs). Further, the start/end indices are set to zero and the aggregation level is set to "document" for all no_answers.
Limitations
The fix prevents the question answering models from predicting answers that contain only whitespaces or tabs.
Answers with empty string were already prevented before.
fixes #729