QA Pipeline: Key Error due to predicting a token outside of allowed context #5711

tholor · 2020-07-13T13:38:04Z

🐛 Bug

Information

Model: distilbert
Language: English
The problem arises when using: QA inference via pipeline

The pipeline throws an exception when the model predicts a token that is not part of the document (e.g. final special token).
In the example below, the model predicts token 13 to be the end of the answer span.
The context however ends at token 12 and token 13 is the final [SEP] token. Therefore, we get a key error when trying to access
feature.token_to_orig_map[13]) in here:

transformers/src/transformers/pipelines.py

Lines 1370 to 1380 in ce374ba

    
           # Convert the answer (tokens) back to the original text 
        
           answers += [ 
        
               { 
        
                   "score": score.item(), 
        
                   "start": np.where(char_to_word == feature.token_to_orig_map[s])[0][0].item(), 
        
                   "end": np.where(char_to_word == feature.token_to_orig_map[e])[0][-1].item(), 
        
                   "answer": " ".join( 
        
                       example.doc_tokens[feature.token_to_orig_map[s] : feature.token_to_orig_map[e] + 1] 
        
                   ), 
        
               } 
        
               for s, e, score in zip(starts, ends, scores)

To reproduce

nlp = pipeline("question-answering",model="distilbert-base-uncased-distilled-squad",
                                  tokenizer="distilbert-base-uncased",
                                  device=-1)

nlp(question="test finding", context="My name is Carla and I live in Berlin")

results in

Traceback (most recent call last):
  File "/home/mp/deepset/dev/haystack/debug.py", line 16, in <module>
    nlp(question="test finding", context="My name is Carla and I live in Berlin")
  File "/home/mp/miniconda3/envs/py37/lib/python3.7/site-packages/transformers/pipelines.py", line 1316, in __call__
    for s, e, score in zip(starts, ends, scores)
  File "/home/mp/miniconda3/envs/py37/lib/python3.7/site-packages/transformers/pipelines.py", line 1316, in <listcomp>
    for s, e, score in zip(starts, ends, scores)
KeyError: 13

Expected behavior

Predictions that are pointing to tokens that are not part of the "context" (here: the last [SEP] token) should be filtered out from possible answers.

Environment info

transformers version: 3.0.2
Platform: Ubuntu 18.04
Python version: 3.7.6
PyTorch version (GPU?): 1.5.1, CPU
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

The text was updated successfully, but these errors were encountered:

mfuntowicz · 2020-07-17T21:15:20Z

Hi @tholor,

Thanks for reporting the issue.

We did have an issue where predictions were going out of bounds on QA pipeline and it has been fixed on master:

>>> nlp = pipeline("question-answering",model="distilbert-base-uncased-distilled-squad",
                                  tokenizer="distilbert-base-uncased",
                                  device=-1)

>>> nlp(question="test finding", context="My name is Carla and I live in Berlin")
>>> {'score': 0.41493675112724304, 'start': 11, 'end': 16, 'answer': 'Carla'}

If you are able to checkout from master branch I would be happy to hear back from you to make sure it's working as expected on your side as well.

Let us know 😃
Morgan

tholor · 2020-07-18T10:36:30Z

Hi @mfuntowicz ,
Works like a charm now. Thanks for the fix!

tholor mentioned this issue Jul 13, 2020

Upgrade to new FARM / Transformers / PyTorch versions deepset-ai/haystack#212

Merged

clmnt assigned mfuntowicz Jul 16, 2020

tholor closed this as completed Jul 18, 2020

brandenchan mentioned this issue Jul 20, 2020

QA Pipeline: Key Error due to predicting a token in question #5910

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QA Pipeline: Key Error due to predicting a token outside of allowed context #5711

QA Pipeline: Key Error due to predicting a token outside of allowed context #5711

tholor commented Jul 13, 2020

mfuntowicz commented Jul 17, 2020 •

edited

Loading

tholor commented Jul 18, 2020

QA Pipeline: Key Error due to predicting a token outside of allowed context #5711

QA Pipeline: Key Error due to predicting a token outside of allowed context #5711

Comments

tholor commented Jul 13, 2020

🐛 Bug

Information

To reproduce

Expected behavior

Environment info

mfuntowicz commented Jul 17, 2020 • edited Loading

tholor commented Jul 18, 2020

mfuntowicz commented Jul 17, 2020 •

edited

Loading