Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the minimal ontology-level metadata fields we would expect to see in an OBO ontology? #1365

Open
cmungall opened this issue Nov 18, 2020 · 8 comments
Labels
attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools documentation Issues related to documentation presented on the website or relevant to Foundry-provided tools ontology metadata Issues related to ontology metadata policy Issues and discussion related to OBO Foundry policies

Comments

@cmungall
Copy link
Contributor

cmungall commented Nov 18, 2020

We have a good schema for what metadata about an ontology we collect centrally in this repo

But what are the minimum fields we expect to see in an ontology header for a good ontology? I know we have various checks for this in the dashboard but do we have a more declarative specification (e.g. shex?)

Use case: The Alliance wants to show information about each ontology they have loaded in the database. While they could just show this in an open-ended manner (e.g. as OLS does it https://www.ebi.ac.uk/ols/ontologies/go), it is better if there is a predictable structure.

I feel there should be a doc we can point groups like this to! Is this in the realm of OMO?

I will have a go at my answer here, but I think this should be doc'd outside a ticket

  • ontology IRI (card: 1)
  • dce:title (card: 1)
  • dce:description (card: 1)
  • dcterms:license (card: 1)
  • owl:versionIRI + owl:versionInfo (card: 1 each)

One caveat to the above is that the two IRI fields are not very user friendly. Clicking on them renders a giant OWL file. This is not something we would want to show on a portal aimed at biologists.

Surprisingly, there is not a field that yields the ontology prefix (GO, OBI). This has to be done programmatically by munging the ontology IRI. This seems far from ideal.

Same for version. At least here we have the more informative versionInfo (this is what bioportal uses). However, this is not populated for many ontologies #771 -- including GO, oops.

Another caveat is that even foundry ontologies may not have all of the above populated. For example, ZFA (which is used by the Alliance) lacks title, description. They also have two values for license (consistent: one is "CC-BY" the other is the URL). I would stress this is not ZFA's fault - they have simply not been asked to do this or provided tools to check for presence/absence/cardinality of ontology metadata.

I think we need a clear computable schema that can be used (1) by portal developers e.g. in the alliance, as well as OLS, BioPortal (2) can be used to check e.g. in the dashboard but also in robot.

In addition to the above fields, there are other fields that are useful to display but are not consistently populated:

  • creator + contributor (can be very long, and is not consistent w.r.t. string names vs orcids) 0..*
  • protege:defaultLanguage 0..1
  • dc:date (redundant with versionInfo for all ontologies that follow recommended OBO versioning, but useful for those that follow other schemes like semversioning, chebi, ...) 0..1
  • dc:subject 0..1 (or 0..*?)
  • rdfs:comment 0..*
@cmungall cmungall added attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools documentation Issues related to documentation presented on the website or relevant to Foundry-provided tools labels Nov 18, 2020
@matentzn
Copy link
Contributor

Amen. I will try to work out how to do that with shex, in the meantime, lets collect thoughts on good metadata fields.

  • version info has been added to ODK a while back and will hopefully start permeating through everywhere
  • ontology IRI (card: 1), dce:title (card: 1), dce:description (card: 1), dcterms:license (card: 1), owl:versionIRI + owl:versionInfo (card: 1 each) should all be mandatory IMO.
  • creator would be nice to be mandatory, but I personally would like this to be orcid rather than name - or at least a meaningful URI of some kind that resolves; contributor property should be specified but optional
  • never seen protege:defaulLanguage so not keen on that.
  • IAO:0000700 (preferred_root) would be useful for browsers
  • You have recently started adding http://purl.org/dc/elements/1.1/type http://purl.obolibrary.org/obo/IAO_8000001 (module type)

I could prepare a basic shex profile to cover this.

@matentzn
Copy link
Contributor

Ha, this was great fun! The current version of the shape is here:

:OBOOntologyShape CLOSED {{
  a [owl:Ontology];
  owl:versionIRI IRI;
  dc:creator xsd:string*;
  dc:contributor xsd:string*;
  dc:title xsd:string;
  dc:date xsd:dateTime?;
  dc:description xsd:string;
  dcterms:license IRI;
  owl:versionInfo xsd:string;
  protege:defaultLanguage xsd:string?;
  rdfs:comment xsd:string*;
  dc:subject xsd:string?;
  obo:IAO_0000700 IRI*;
  dc:type IRI?
}}

I have implemented it in an example notebook here, running it against envo, wbphenotype and cl as examples. I have never worked with shapes seriously but now that I see them.. loving it (of course we are here only scratching the very outer surface). Keep more such tickets coming @cmungall (and everyone else).

@jamesaoverton
Copy link
Member

This is a worthwhile goal, and @matentzn's schema seems like a good start. I don't have anything to add to that, but I had a few related thoughts:

  • we use SPARQL instead of ShEx in ROBOT report because we want fine-grained error reporting, but I can see the value in having this ShEx shape as a single test
  • @matentzn's shape seems to just use the OWL, which is easier than also fetching from the OBO registry
  • when these fields in the OWL file overlap with fields from the registry, the Dashboard code should be checking that they're in sync: title, description, license, etc. I know it does for the license but I can't remember for the others.

@cmungall
Copy link
Contributor Author

Yes, I love having banks of sparql queries but I like having this abstracted to a structure like shex - especially with complex constraints that link objects to other objects.

Some shex validation frameworks such as @hsolbrig's PyShEx can work by crawling a triplestore recursively executing SPARQL queries. I think it would also be possible to translate from ShEx to SPARQL, which might be nice too.

The redundancy between this and json-schema based checks over the registry yaml/json is mildly dissatisfying, but I feel there is a path to unification (our yaml is actually yaml-ld, and has an rdf form...)

I just realized I abandoned this a while ago, not much there, but it gives an idea of how shapes could be used to check classes too. This is especially useful for certain profiles of ontologies: https://github.com/cmungall/obo-shapes

We are using shex heavily for aboxes in GO, e.g. here is what a GO MF instance looks like:

https://github.com/geneontology/go-shapes/blob/e05a415d8b5178c4ac2b4662d42171d14f19a1cf/shapes/go-cam-shapes.shex#L364-L386

Once nice feature is that every aspect of the shape can be arbitrarily annotated with annotations, e.g. seeAlso linking to a ticket

@nlharris nlharris added the ontology metadata Issues related to ontology metadata label Jan 6, 2021
@nlharris nlharris added the policy Issues and discussion related to OBO Foundry policies label Oct 20, 2022
@nlharris
Copy link
Contributor

Should this stay open?

@matentzn
Copy link
Contributor

I think this is important, but we do not have an appropriate role for handling issues like this. Its too unspecific to be handled by a project like Monarch, and too work intensive to just do as a side project for someone uninitiated in OBO.

I think looking back at this the validation aspect (all but comment numero uno) are a distraction.

If someone at EWG wants to tackle this issue maybe:

  1. Create a google docs with a table with all important ontology relationships, maybe inspired by an ubergraph query and Chris brain dump above
  2. Have columns for property, Optional/MUST/SHOULD, data type (IRI, string etc)
  3. Share it here for comments
  4. Create a documentation page with that table, and adding it to wherever @nataled things such recommendations belong

@nataled
Copy link
Contributor

nataled commented Nov 25, 2024

I have a non-implemented idea of where this could go, but we could/should push this more quickly than that ultimate page will be developed (could go into a(n) FAQ for now?). For the EWG to work on this, we'd need a list of what should be included (@matentzn should it be all the bulleted points in the first comment, or only the top set, or the list in your ShEx shape above...?)

@matentzn
Copy link
Contributor

I started a table here:

https://docs.google.com/document/d/1fFGeLjRTEBPUXLDGMtvq8lZD2fRBVus4X6VmmNm4Se4/edit?tab=t.0

Maybe you and I do the first round, then we start looping in others?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools documentation Issues related to documentation presented on the website or relevant to Foundry-provided tools ontology metadata Issues related to ontology metadata policy Issues and discussion related to OBO Foundry policies
Projects
None yet
Development

No branches or pull requests

5 participants