-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RxNorm linker not working and I think its possibly due to an incorrect path to ai2 AWS instance #249
Comments
@diegoolano Derp sorry, my mistake. This is just a copy and paste error. I will fix this and release a fix to pypi on monday. |
@DeNeutoy Just curious, is this issue resolved? Below is my code and I'm getting a similar error.
KeyError Traceback (most recent call last) in () 2 frames /usr/local/lib/python3.6/dist-packages/scispacy/candidate_generation.py in call(self, mention_texts, k) KeyError: 'Aspirin' |
I do have the same issue It seems that in the new release he fixed the error, the file path is correct EDIT: The pip version of scispacy does not have the fix. I have solved modifying the code directly into the installed library. |
Hi, @giuliacassara and @sahas- - apologies, I never got around to releasing this. You can use the latest version by installing from master. There are a couple of other fixes I want to include in the next release, so we don't have a particular deadline for that at the moment sorry. |
Fixed by #280 |
A colleague and I are both running into an issue while trying to use the new RxNorm linker, and
I think its possibly due to an incorrect URL being set for RxNorm(KnowledgeBase) file_path in scispacy/linking_utils.py,
its currently set to the same path as HumanPhenotypeOntology.
I attempted to figure out what that correct file_path URL for RxNorm might be to see if that would fix the issue, but to no avail. Any thoughts on the issue? Could you provide the URL if that is in fact the issue? Thank you.
#246
Reproduction notes:
Install
virtualenv --python=/usr/bin/python3 scispacy_newer
source scispacy_newer/bin/activate
pip install scispacy
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_lg-0.2.5.tar.gz
In notebook
import spacy
import scispacy
from scispacy.abbreviation import AbbreviationDetector
from scispacy.linking import EntityLinker
nlp = spacy.load("en_core_sci_lg")
rxlinker = EntityLinker(resolve_abbreviations=True, name="rxnorm")
nlp.add_pipe(rxlinker)
text = "The Aspirin was not helpful so I took Advil to help with my headache."
doc = nlp(text)
Error raised:
/Users/diegoolano/sandbox/newer_scispacy/scispacy_newer/lib/python3.7/site-packages/scispacy/candidate_generation.py:283: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
/Users/diegoolano/sandbox/newer_scispacy/scispacy_newer/lib/python3.7/site-packages/scispacy/candidate_generation.py:284: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
KeyError Traceback (most recent call last)
in
1 text = "The Aspirin was not helpful so I took Advil to help with my headache."
----> 2 doc = nlp(text)
~/sandbox/newer_scispacy/scispacy_newer/lib/python3.7/site-packages/spacy/language.py in call(self, text, disable, component_cfg)
447 if not hasattr(proc, "call"):
448 raise ValueError(Errors.E003.format(component=type(proc), name=name))
--> 449 doc = proc(doc, **component_cfg.get(name, {}))
450 if doc is None:
451 raise ValueError(Errors.E005.format(name=name))
~/sandbox/newer_scispacy/scispacy_newer/lib/python3.7/site-packages/scispacy/linking.py in call(self, doc)
102
103 mention_strings = [x.text for x in mentions]
--> 104 batch_candidates = self.candidate_generator(mention_strings, self.k)
105
106 for mention, candidates in zip(doc.ents, batch_candidates):
~/sandbox/newer_scispacy/scispacy_newer/lib/python3.7/site-packages/scispacy/candidate_generation.py in call(self, mention_texts, k)
342 for neighbor_index, distance in zip(neighbors, distances):
343 mention = self.ann_concept_aliases_list[neighbor_index]
--> 344 concepts_for_mention = self.kb.alias_to_cuis[mention]
345 for concept_id in concepts_for_mention:
346 concept_to_mentions[concept_id].append(mention)
KeyError: 'Aspirin'
Thoughts:
I looked into the error some and looking at the self.kb.alias_to_cuis dict shows it has a length of 32111, but its entries don't seem to refer to drugs.
Beginning of dict:
"""
{'Abdominal cramps': {'C0000729'}, 'Abdominal bloating': {'C0000731'}, 'Bloating': {'C0000731'}, 'Abdominal swelling': {'C0000731'}, 'Abdominal distension': {'C0000731'}, 'Abdominal distention': {'C0000731'}
"""
This dict is set by processing the RxNorm class that extends the KnowledgeBase class in "scispacy/linking_utils.py"
but in looking at it there appears to be a typo in the filepath it uses.
As of now it points to the same file as Human PhenotypeOntology?
"""
class GeneOntology(KnowledgeBase):
def init(
self,
file_path: str = "https://ai2-s2-scispacy.s3-us-west-2.amazonaws.com/data/umls_2020_gene_ontology.jsonl",
):
super().init(file_path)
class HumanPhenotypeOntology(KnowledgeBase):
def init(
self,
file_path: str = "https://ai2-s2-scispacy.s3-us-west-2.amazonaws.com/data/umls_2020_human_phenotype_ontology.jsonl", # noqa
):
super().init(file_path)
class RxNorm(KnowledgeBase):
def init(
self,
file_path: str = "https://ai2-s2-scispacy.s3-us-west-2.amazonaws.com/data/umls_2020_human_phenotype_ontology.jsonl", # noqa
):
super().init(file_path)
"""
The text was updated successfully, but these errors were encountered: