Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support looking up terms by OBO Id #46

Open
matthewhorridge opened this issue Nov 20, 2023 · 17 comments
Open

Support looking up terms by OBO Id #46

matthewhorridge opened this issue Nov 20, 2023 · 17 comments
Assignees
Labels
XA2 Enhance usability, completeness, and reliability in domain knowledge for RADx and research data mana XA2.3 In vocabulary management system (BioPortal/OntoPortal), establish ontological views...

Comments

@matthewhorridge
Copy link

The RADx Data Dictionary Specification allows ontology terms to be provided for Data Elements and Enumerations. These terms can be provided in a short form as OBO Ids. For example, NCIT:C16670.

When I search for terms in BioPortal I get no results, e.g.

image_720

This should return the exact match of the term with the id in first position.

@matthewhorridge matthewhorridge added the XA2.3 In vocabulary management system (BioPortal/OntoPortal), establish ontological views... label Nov 20, 2023
@marcosmro marcosmro assigned marcosmro and unassigned marcosmro Nov 20, 2023
@marcosmro marcosmro added the XA2 Enhance usability, completeness, and reliability in domain knowledge for RADx and research data mana label Nov 20, 2023
@alexskr
Copy link

alexskr commented Nov 30, 2023

BioPortal doesn't have OBO version of the NCIT at the moment and owl version useThesaurus:C16670 form of ID; however, search still fails for it

@matthewhorridge
Copy link
Author

Please could we use NCIT:CXXXXX as a synonym for Thesaurus:CXXXXX?

In general the pattern is {OntologyAcronym}:{TermCode}. The ontology with {OntologyAcronym} as it's acronym should always be the top hit, ignoring all other metrics.

The more I work with these kinds of terms and searches for them the more critical I think this is for RADx. It is also useful for BioPortal in general and anyone working with OBO ontologies.

@matthewhorridge
Copy link
Author

More details (in place here)...

We are looking for terms that have this IRI pattern: http://purl.obolibrary.org/obo/{OntologyAcronym}_{NumericId}. These should be indexed against the string {OntologyAcronym}:{NumericId}

@marcosmro
Copy link
Contributor

@mdorf Could you please break down this issue into a list of low-level tasks, and provide an estimate of the time needed for completion?

@mdorf
Copy link
Collaborator

mdorf commented Dec 2, 2023

More details (in place here)...

We are looking for terms that have this IRI pattern: http://purl.obolibrary.org/obo/{OntologyAcronym}_{NumericId}. These should be indexed against the string {OntologyAcronym}:{NumericId}

Is there a generic algorithm that would apply to any onotlogy ID? Say, {OntologyAcronym}:{Last Fragment of ID}?

@mdorf
Copy link
Collaborator

mdorf commented Dec 2, 2023

@mdorf Could you please break down this issue into a list of low-level tasks, and provide an estimate of the time needed for completion?

Probably something like this:

  1. Fix the existing search on short IDs (with no colons) - 3 days
  2. Enable search on the short IDs with colons for ontology terms (need some time to investigate this, as we had purposefully avoided this case for some reason) ~ 5 days
  3. Implement missing support for ontology IRIs in BioPortal ~ 2 days
  4. Enable search on the {OntologyAcronym}:{TermCode} ~ 4 days

@matthewhorridge
Copy link
Author

More details (in place here)...
We are looking for terms that have this IRI pattern: http://purl.obolibrary.org/obo/{OntologyAcronym}_{NumericId}. These should be indexed against the string {OntologyAcronym}:{NumericId}

Is there a generic algorithm that would apply to any onotlogy ID? Say, {OntologyAcronym}:{Last Fragment of ID}?

I think this could be true.

@matthewhorridge
Copy link
Author

Just a note... because multiple ontologies can reuse terms something like

http://purl.obolibrary.org/obo/{CL}_{0000001} could appear in multiple ontologies. However, the very top hit should be the CL ontology.

@mdorf
Copy link
Collaborator

mdorf commented Feb 12, 2024

@matthewhorridge, below are some example terms from different ontologies that we went over in the meeting. Would it be possible for you to fill in the results for each example that show what the short ID would look like? Also, if you can document the general rules for extracting these short IDs, it would be great. Ideally, this solution should handle ALL variations of ontologies in our system to be relatively generic.

Acronym: NCIT
ID: http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C20047
prefixIRI: Thesaurus:C20047
Result: NCIT:C20047

Acronym: LOINC
ID: http://purl.bioontology.org/ontology/LNC/MTHU000231
notation: MTHU000231
Result: ---

Acronym: DRON
ID: http://purl.obolibrary.org/obo/CHEBI_46195
notation: CHEBI:46195
Result: CHEBI:46195

Acronym: RXNORM
ID: http://purl.bioontology.org/ontology/RXNORM/202433
notation: 202433
Result: ---
Acronym: GO
ID: http://purl.obolibrary.org/obo/GO_0050892
notation: GO:0050892
Result: GO:0050892

Acronym: ONS
ID: http://purl.obolibrary.org/obo/GO_0003872
prefixIRI: GO:0003872
Result: GO:0003872

Acronym: BAO
ID: http://www.bioassayontology.org/bao#BAO_0003114
prefixIRI: bao:BAO_0003114
Result: BAO:0003114

Acronym: GFO
ID: http://www.onto-med.de/ontologies/gfo.owl#Relational_role
prefixIRI: gfo:Relational_role
Result: ---

Acronym: UNITSONT
ID: http://mimi.case.edu/ontologies/2009/1/UnitsOntology#base_unit
prefixIRI: unit:base_unit
Result: ---

Acronym: ICF
ID: http://who.int/icf#b126
prefixIRI: b126
Result: ---

Acronym: EDAM
ID: http://edamontology.org/data_1598
prefixIRI: data_1598
Result: data:1598

Acronym: PMA
ID: http://www.bioontology.org/pma.owl#PMA_357
prefixIRI: PMA_357
Result: PMA:357

Acronym: NDF-RT
ID: http://evs.nci.nih.gov/ftp1/NDF-RT/NDF-RT.owl#N0000011165
prefixIRI: N0000011165
Result: ---

@matthewhorridge
Copy link
Author

matthewhorridge commented Feb 12, 2024

@mdorf I have updated your post with results. The basic algorithm is:

  1. Find this regex in the ID: ([A-Za-z]+)_([0-9]+)$
  2. If found, then the "OboId" is formed from the regex result as $1:$2

This regex might be a little bit conservative but I'd prefer to stick to this for now.

I think the name of the field should be OboId (i.e. where we currently have result).

If the $1 group match is equal to the ontology ancronym then this result should be boosted to be the top search result

Note that NCIT is a special case here because we don't have the OBO version of it in BioPortal

@mdorf
Copy link
Collaborator

mdorf commented Feb 13, 2024

Thank you, @matthewhorridge for documenting these. It seems NCIT is a special case for both rules 1. and 2., correct? It does not match the regex AND the prefix of the OBO ID is formed using the ontology acronym instead of the $1 match.

@matthewhorridge
Copy link
Author

Yes, that's right. One possibility is that if the above rules fail to match the regex then,

If the term ID starts with the ontology IRI, (1) remove the ontology IRI matching part, (2) next remove the first character of the remaining part (# or / would be expected) and then (3) take the ontology acronym append a colon and append the remaining term ID characters. For example,

Given, http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C20047 with an ontology IRI of http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl and an ontology acronym of NCIT,

(1) Remove http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl from the term ID, to give #C20047.
(2) Remove the next character, #, to give C20047
(3) Concatentate the ontology acronym (NCIT) with a colon and the remainder of the term ID i.e. NCIT:C20047

@mdorf
Copy link
Collaborator

mdorf commented Feb 13, 2024

If the term ID starts with the ontology IRI

The only issue with that is that we don't expose the ontology IRI via our API or store it as metadata at the moment.

@mdorf
Copy link
Collaborator

mdorf commented Feb 14, 2024

These rules require implementing an additional functionality in BioPortal that would allow retrieving the ontology IRIs. Timeline adjusted accordingly...

@matthewhorridge
Copy link
Author

As a first step would it be possible to implement the functionality that does not require looking at the ontology IRI? (So do this as a two step implementation)

@mdorf
Copy link
Collaborator

mdorf commented Feb 14, 2024

As a first step would it be possible to implement the functionality that does not require looking at the ontology IRI? (So do this as a two step implementation)

@matthewhorridge, we actually have an existing pull request from AgroPortal that implements the ability to retrieve the ontology IRI. The only issue is this PR is two years old, and some code has diverged from its original implementation, which requires a bit of manual work during the merge. I don't expect it to be a huge undertaking, so my recommendation is to roll it in as part of this development. It's also a very important and useful metadata attribute to be exposed via the BioPortal API.

mdorf added a commit to ncbo/ontologies_api that referenced this issue Feb 27, 2024
@alexskr
Copy link

alexskr commented Mar 20, 2024

Short ID search enhancement has been deployed in BioPortal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
XA2 Enhance usability, completeness, and reliability in domain knowledge for RADx and research data mana XA2.3 In vocabulary management system (BioPortal/OntoPortal), establish ontological views...
Projects
None yet
Development

No branches or pull requests

4 participants