Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA Pipeline: Key Error due to predicting a token outside of allowed context #5711

Closed
tholor opened this issue Jul 13, 2020 · 2 comments
Closed
Assignees

Comments

@tholor
Copy link
Contributor

tholor commented Jul 13, 2020

🐛 Bug

Information

Model: distilbert
Language: English
The problem arises when using: QA inference via pipeline

The pipeline throws an exception when the model predicts a token that is not part of the document (e.g. final special token).
In the example below, the model predicts token 13 to be the end of the answer span.
The context however ends at token 12 and token 13 is the final [SEP] token. Therefore, we get a key error when trying to access
feature.token_to_orig_map[13]) in here:

# Convert the answer (tokens) back to the original text
answers += [
{
"score": score.item(),
"start": np.where(char_to_word == feature.token_to_orig_map[s])[0][0].item(),
"end": np.where(char_to_word == feature.token_to_orig_map[e])[0][-1].item(),
"answer": " ".join(
example.doc_tokens[feature.token_to_orig_map[s] : feature.token_to_orig_map[e] + 1]
),
}
for s, e, score in zip(starts, ends, scores)

To reproduce

nlp = pipeline("question-answering",model="distilbert-base-uncased-distilled-squad",
                                  tokenizer="distilbert-base-uncased",
                                  device=-1)

nlp(question="test finding", context="My name is Carla and I live in Berlin")

results in

Traceback (most recent call last):
  File "/home/mp/deepset/dev/haystack/debug.py", line 16, in <module>
    nlp(question="test finding", context="My name is Carla and I live in Berlin")
  File "/home/mp/miniconda3/envs/py37/lib/python3.7/site-packages/transformers/pipelines.py", line 1316, in __call__
    for s, e, score in zip(starts, ends, scores)
  File "/home/mp/miniconda3/envs/py37/lib/python3.7/site-packages/transformers/pipelines.py", line 1316, in <listcomp>
    for s, e, score in zip(starts, ends, scores)
KeyError: 13

Expected behavior

Predictions that are pointing to tokens that are not part of the "context" (here: the last [SEP] token) should be filtered out from possible answers.

Environment info

  • transformers version: 3.0.2
  • Platform: Ubuntu 18.04
  • Python version: 3.7.6
  • PyTorch version (GPU?): 1.5.1, CPU
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No
@mfuntowicz
Copy link
Member

mfuntowicz commented Jul 17, 2020

Hi @tholor,

Thanks for reporting the issue.

We did have an issue where predictions were going out of bounds on QA pipeline and it has been fixed on master:

>>> nlp = pipeline("question-answering",model="distilbert-base-uncased-distilled-squad",
                                  tokenizer="distilbert-base-uncased",
                                  device=-1)

>>> nlp(question="test finding", context="My name is Carla and I live in Berlin")
>>> {'score': 0.41493675112724304, 'start': 11, 'end': 16, 'answer': 'Carla'}

If you are able to checkout from master branch I would be happy to hear back from you to make sure it's working as expected on your side as well.

Let us know 😃
Morgan

@tholor
Copy link
Contributor Author

tholor commented Jul 18, 2020

Hi @mfuntowicz ,
Works like a charm now. Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants