Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert QACandidates with empty or whitespace answers to no_answers on doc level #756

Merged
merged 2 commits into from
May 10, 2021

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented May 5, 2021

There are no_answer QACandidates with non-zero start/end indices that cause errors as reported in #729 .

Here are two examples. The first examples contains multiple consecutive whitespaces "mass of _ _ will" because latex commands were removed from the original text. The second example contains a whitespace at the very end "geographer. _"

    QA_input = [
        {
            "questions": ["What has a magnitude of about 8.81 meters per second squared?"],
            "text": """What we now call gravity was not identified as a universal force until the work of Isaac Newton. Before Newton, the tendency for objects to fall towards the Earth was not understood to be related to the motions of celestial objects. Galileo was instrumental in describing the characteristics of falling objects by determining that the acceleration of every object in free-fall was constant and independent of the mass of the object. Today, this acceleration due to gravity towards the surface of the Earth is usually designated as  and has a magnitude of about 9.81 meters per second squared (this measurement is taken from sea level and may vary depending on location), and points toward the center of the Earth. This observation means that the force of gravity on an object at the Earth's surface is directly proportional to the object's mass. Thus an object that has a mass of  will experience a force:"""
        }]
    QA_input = [
        {
            "questions": [" When was Isiah Bowman not appointed to President Wilson\'s Inquiry?"],
            "text": """One key figure in the plans for what would come to be known as American Empire, was a geographer named Isiah Bowman. Bowman was the director of the American Geographical Society in 1914. Three years later in 1917, he was appointed to then President Woodrow Wilson's inquiry in 1917. The inquiry was the idea of President Wilson and the American delegation from the Paris Peace Conference. The point of this inquiry was to build a premise that would allow for U.S authorship of a 'new world' which was to be characterized by geographical order. As a result of his role in the inquiry, Isiah Bowman would come to be known as Wilson's geographer. """
        }]

This PR fixes the issue by converting all predicted answers to no_answers that consist of an empty string or contain no other symbols than whitespaces (including tabs). Further, the start/end indices are set to zero and the aggregation level is set to "document" for all no_answers.

Limitations
The fix prevents the question answering models from predicting answers that contain only whitespaces or tabs.
Answers with empty string were already prevented before.

fixes #729

@julian-risch julian-risch changed the title WIP: Convert QACandidates with empty or whitespace answers to no_answers on doc level Convert QACandidates with empty or whitespace answers to no_answers on doc level May 5, 2021
@julian-risch julian-risch marked this pull request as ready for review May 5, 2021 13:07
Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting insights about the white spaces.
Your fix to that seems good.

I made a comment about your use of aggregation level that I dont understand...

farm/modeling/predictions.py Outdated Show resolved Hide resolved
@Timoeller Timoeller self-requested a review May 10, 2021 14:47
Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG!

@julian-risch julian-risch merged commit b9fcd26 into master May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QACandidates with wrong indices for a no_answer prediction
2 participants