Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RxNorm linker not working and I think its possibly due to an incorrect path to ai2 AWS instance #249

Closed
diegoolano opened this issue Jul 11, 2020 · 5 comments

Comments

@diegoolano
Copy link

A colleague and I are both running into an issue while trying to use the new RxNorm linker, and
I think its possibly due to an incorrect URL being set for RxNorm(KnowledgeBase) file_path in scispacy/linking_utils.py,
its currently set to the same path as HumanPhenotypeOntology.

I attempted to figure out what that correct file_path URL for RxNorm might be to see if that would fix the issue, but to no avail. Any thoughts on the issue? Could you provide the URL if that is in fact the issue? Thank you.

#246

Reproduction notes:

Install

virtualenv --python=/usr/bin/python3 scispacy_newer
source scispacy_newer/bin/activate

pip install scispacy
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_lg-0.2.5.tar.gz

In notebook

import spacy
import scispacy
from scispacy.abbreviation import AbbreviationDetector
from scispacy.linking import EntityLinker

nlp = spacy.load("en_core_sci_lg")
rxlinker = EntityLinker(resolve_abbreviations=True, name="rxnorm")
nlp.add_pipe(rxlinker)

text = "The Aspirin was not helpful so I took Advil to help with my headache."
doc = nlp(text)

Error raised:
/Users/diegoolano/sandbox/newer_scispacy/scispacy_newer/lib/python3.7/site-packages/scispacy/candidate_generation.py:283: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]

/Users/diegoolano/sandbox/newer_scispacy/scispacy_newer/lib/python3.7/site-packages/scispacy/candidate_generation.py:284: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]


KeyError Traceback (most recent call last)
in
1 text = "The Aspirin was not helpful so I took Advil to help with my headache."
----> 2 doc = nlp(text)

~/sandbox/newer_scispacy/scispacy_newer/lib/python3.7/site-packages/spacy/language.py in call(self, text, disable, component_cfg)
447 if not hasattr(proc, "call"):
448 raise ValueError(Errors.E003.format(component=type(proc), name=name))
--> 449 doc = proc(doc, **component_cfg.get(name, {}))
450 if doc is None:
451 raise ValueError(Errors.E005.format(name=name))

~/sandbox/newer_scispacy/scispacy_newer/lib/python3.7/site-packages/scispacy/linking.py in call(self, doc)
102
103 mention_strings = [x.text for x in mentions]
--> 104 batch_candidates = self.candidate_generator(mention_strings, self.k)
105
106 for mention, candidates in zip(doc.ents, batch_candidates):

~/sandbox/newer_scispacy/scispacy_newer/lib/python3.7/site-packages/scispacy/candidate_generation.py in call(self, mention_texts, k)
342 for neighbor_index, distance in zip(neighbors, distances):
343 mention = self.ann_concept_aliases_list[neighbor_index]
--> 344 concepts_for_mention = self.kb.alias_to_cuis[mention]
345 for concept_id in concepts_for_mention:
346 concept_to_mentions[concept_id].append(mention)
KeyError: 'Aspirin'

Thoughts:
I looked into the error some and looking at the self.kb.alias_to_cuis dict shows it has a length of 32111, but its entries don't seem to refer to drugs.

Beginning of dict:
"""
{'Abdominal cramps': {'C0000729'}, 'Abdominal bloating': {'C0000731'}, 'Bloating': {'C0000731'}, 'Abdominal swelling': {'C0000731'}, 'Abdominal distension': {'C0000731'}, 'Abdominal distention': {'C0000731'}
"""

This dict is set by processing the RxNorm class that extends the KnowledgeBase class in "scispacy/linking_utils.py"
but in looking at it there appears to be a typo in the filepath it uses.
As of now it points to the same file as Human PhenotypeOntology?

"""
class GeneOntology(KnowledgeBase):
def init(
self,
file_path: str = "https://ai2-s2-scispacy.s3-us-west-2.amazonaws.com/data/umls_2020_gene_ontology.jsonl",
):
super().init(file_path)

class HumanPhenotypeOntology(KnowledgeBase):
def init(
self,
file_path: str = "https://ai2-s2-scispacy.s3-us-west-2.amazonaws.com/data/umls_2020_human_phenotype_ontology.jsonl", # noqa
):
super().init(file_path)

class RxNorm(KnowledgeBase):
def init(
self,
file_path: str = "https://ai2-s2-scispacy.s3-us-west-2.amazonaws.com/data/umls_2020_human_phenotype_ontology.jsonl", # noqa
):
super().init(file_path)
"""

Screen Shot 2020-07-11 at 9 56 50 AM

@DeNeutoy
Copy link
Contributor

@diegoolano Derp sorry, my mistake. This is just a copy and paste error. I will fix this and release a fix to pypi on monday.

@sahas-
Copy link

sahas- commented Aug 28, 2020

@DeNeutoy Just curious, is this issue resolved? Below is my code and I'm getting a similar error.

!pip install scispacy
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_sm-0.2.4.tar.gz

import spacy
import scispacy
from scispacy.linking import EntityLinker
import en_core_sci_sm

nlp = en_core_sci_sm.load()
linker = EntityLinker(resolve_abbreviations=False, name="rxnorm")
nlp.add_pipe(linker)

text = nlp("Aspirin")
print(doc.ents)


KeyError Traceback (most recent call last)

in ()
----> 1 text = nlp("Aspirin")
2 print(doc)
3 print(doc.ents)

2 frames

/usr/local/lib/python3.6/dist-packages/scispacy/candidate_generation.py in call(self, mention_texts, k)
342 for neighbor_index, distance in zip(neighbors, distances):
343 mention = self.ann_concept_aliases_list[neighbor_index]
--> 344 concepts_for_mention = self.kb.alias_to_cuis[mention]
345 for concept_id in concepts_for_mention:
346 concept_to_mentions[concept_id].append(mention)

KeyError: 'Aspirin'

@giuliacassara
Copy link

giuliacassara commented Aug 29, 2020

I do have the same issue
KeyError: 'Isopto Alkaline'

It seems that in the new release he fixed the error, the file path is correct
file_path: str = "https://ai2-s2-scispacy.s3-us-west-2.amazonaws.com/data/umls_2020_rxnorm.jsonl", but I still get KeyError

EDIT: The pip version of scispacy does not have the fix. I have solved modifying the code directly into the installed library.

@DeNeutoy
Copy link
Contributor

Hi, @giuliacassara and @sahas- - apologies, I never got around to releasing this. You can use the latest version by installing from master. There are a couple of other fixes I want to include in the next release, so we don't have a particular deadline for that at the moment sorry.

@DeNeutoy
Copy link
Contributor

Fixed by #280

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants