You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are cases in which we already have input text with entity annotations (e.g., from PubTator etc.).
These may be inline or in other formats, like JSON.
It would be useful to be able to pass these annotations directly to the extracted output, independent of any SPIRES extraction.
(this relates to the MAXO annotation extraction specifically)
For example:
We have input text containing "I take aspirin for a headache" and we're trying to extract relations between drugs and symptoms.
I already have the annotation of CHEBI:15365 for aspirin but still need to extract and ground headache.
The schema will still need to define a class for Drug and Symptom and the relation between the two, and that's fine as long as any provided annotations match that schema, but there's a tricky point here where the LLM may not extract an entity matching the provided annotation.
So the naive way to do this is just to pass all provided annotations directly to the output (making subissue for this).
But a more useful way may be to do a sort of mini-RAG in which the pre-provided entity annotations (just the text, no IDs) are injected into results as extraction is done recursively, so if we expect to see a relation between A and B then we inject the entity annotation for A before the LLM tries to find any relations between A and B.
Or we just decouple NER and RE entirely and don't bother with using the LLM for the former in some cases.
(Some of this may be more in the curate-gpt space, but there are certainly some options to try)
The text was updated successfully, but these errors were encountered:
There are cases in which we already have input text with entity annotations (e.g., from PubTator etc.).
These may be inline or in other formats, like JSON.
It would be useful to be able to pass these annotations directly to the extracted output, independent of any SPIRES extraction.
(this relates to the MAXO annotation extraction specifically)
For example:
We have input text containing "I take aspirin for a headache" and we're trying to extract relations between drugs and symptoms.
I already have the annotation of CHEBI:15365 for
aspirin
but still need to extract and groundheadache
.The schema will still need to define a class for
Drug
andSymptom
and the relation between the two, and that's fine as long as any provided annotations match that schema, but there's a tricky point here where the LLM may not extract an entity matching the provided annotation.So the naive way to do this is just to pass all provided annotations directly to the output (making subissue for this).
But a more useful way may be to do a sort of mini-RAG in which the pre-provided entity annotations (just the text, no IDs) are injected into results as extraction is done recursively, so if we expect to see a relation between A and B then we inject the entity annotation for A before the LLM tries to find any relations between A and B.
Or we just decouple NER and RE entirely and don't bother with using the LLM for the former in some cases.
(Some of this may be more in the curate-gpt space, but there are certainly some options to try)
The text was updated successfully, but these errors were encountered: