Skip to content

Next-generation EDAM quality-control utility

Notifications You must be signed in to change notification settings

LucieLamothe/edamverify

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EDAM Verification Utility Suite : edamverify Build Status

edamverify is a utility suite for verification of the EDAM ontology. It implements a set of quality control (QC) checks based upon:

edamverify implement all checks previously implemented in edamxpathvalidator.

edamverify is invoked whenever the development copy of EDAM (EDAM_dev.owl) is changed, using the EDAM Travis CI system.

NB Current status: edamverify is fully specified - implementation is ongoing.

EDAM QC implementation

EDAM QC consists of:

  • invocation of the report utility from the ROBOT ontology verification suite. This runs a series of basic quality control SPARQL queries, such as duplicated labels or synyonms, missing ontology metadata, references to deprecated concepts etc.
  • invocation of edamverify which runs a series of SPARQL and SHACL queries, defined in the queries/ folder (see below) which are tailored specifically to EDAM. The SPARQL queries are invoked using the ROBOT verify utility. The SHACL queries are invoked directly.

Each query has a logging level (based on ROBOT report) which defines the severity of the issue:

  • ERROR: Must be fixed before releasing EDAM. These issues will cause problems for users, such as classes with multiple labels.
  • WARN: Should be fixed as soon as possible. These will not cause problems for all users, but may not be what they expect. For example, a class that is inferred to be equivalent to another named class.
  • INFO: Should be fixed if possible. These are for consistency and cleanliness, such as definitions that do not start with an uppercase character.
  • NOERR: No error found.

The problems detected by each query and its remedy are documented in the docs folder.

Report format

The QC check results are written to the last cell [1] of the Jupyter notebook in a consistent JSON format, for example:

[1] as required by the script which invokes and parses these notebooks in Travis CI


{
    "test_name": "fileExtensionBadCharacter",
    "reason": [
        "Bad characters found in <file_extension> property of these concepts:",
        "http://edamontology.org/format_3556 (MHTML): mhtml|mht|eml",
        "http://edamontology.org/format_3682 (imzML metadata file): imzML",
        "http://edamontology.org/format_3789 (XQuery): xq|xqy|xquery",
        "http://edamontology.org/format_3475 (TSV): tsv|tab",
        "http://edamontology.org/format_3750 (YAML): yaml|yml"
    ],
    "status": "WARN"
}

Tests

Test Level Docs Issue Solution [1] File Status
Omission of properties required for deprecated concepts INFO - ERROR docs 3 IPYNB annotationDeprecationOmission.ipynb DONE
Misuse of properties intended for deprecated concepts only ERROR docs 2 IPYNB annotationDeprecationMisuse.ipynb DONE
Ontology max depth exceeded WARN docs 6 SPARQL maxDepthExceeded.sparql todo
Singleton leaf node WARN docs 7 SPARQL singletonLeaf.sparql todo
Subset misuse ERROR docs 14, 17, 25, 27, 28 IPYNB subsetMisuse.ipynb DONE
Disallowed synonym ERROR docs 11 IPYNB disallowedSynonym.ipynb DONE
Placeholder chain too long ERROR docs 8 SPARQL placeholderChainTooLong.sparql todo
Unexpected multiple parents WARN docs 9 SPARQL unexpectedMultipleParents.sparql todo
Possible spelling mistake INFO docs 10 SPARQL spellingMistake.sparql todo
Bad EDAM URI reference ERROR docs 12 SPARQL badEdamUriReference.sparql todo
Bad non-boolean value WARN docs 13 IPYNB badNonBooleanValue.ipynb DONE
Mandatory property missing ERROR docs 8 IPYNB mandatoryPropertyMissing.ipynb DONE
Format property missing INFO - WARN docs 9, 11 IPYNB formatPropertyMissing.ipynb DONE
Identifier property missing INFO docs 10 IPYNB identifierPropertyMissing.ipynb DONE
Wikipedia link missing INFO docs 24 IPYNB wikipediaLinkMissing.ipynb DONE
Leaf concept is placeholder WARN docs 12 SPARQL placeholderLeafConcept.sparql todo
isIdentifierOf redundancy WARN docs 13 SPARQL isIdentifierOfRedundancy.sparql todo
Identifier relation missing ERROR docs 14 SPARQL identifierRelationMissing.sparql todo
Format relation missing ERROR docs 26 SPARQL formatRelationMissing.sparql todo
Redundant subclass relation WARN docs 15 SPARQL redundantSubclassRelation.sparql todo
Deprecated concept with disallowed annotations or axioms WARN docs 16 IPYNB disallowedDeprecatedContent.ipynb DONE
Concept ID numerical duplication ERROR docs 18 IPYNB idNumericalDuplication.ipynb DONE
File extension lacks synyonm WARN docs 19 SPARQL fileExtensionMissingSynonym.ipynb DONE
File extension bad characters WARN docs 19, 20 IPYNB fileExtensionBadCharacter.ipynb DONE
Misuse of Wikipedia links WARN docs 23 IPYNB wikipediaMisuse.ipynb DONE

[1] things labellled as "SPARQL" are implemented purely in SPARQL. "SHACL" is another possibility. Failing that "IPYNB" (Juypter notebook with SPARQL and Python code) or "Python" (in later two cases the links under "File" will be replaced with links to the relevant notebook or Python script).

General queries (from ROBOT report)

Query Description Level
annotation whitespace link WARN
deprecated boolean datatype link ERROR
deprecated class reference link ERROR
deprecated property reference link ERROR
duplicate definition link ERROR
duplicate exact synonym link WARN
duplicate label synonym link WARN
duplicate label link ERROR
duplicate scoped synonym link WARN
equivalent pair link WARN
invalid xref link WARN
label formatting link ERROR
label whitespace link ERROR
lowercase definition link INFO
missing definition link WARN
missing label link ERROR
missing obsolete label link WARN
missing ontology description link ERROR
missing ontology license link ERROR
missing ontology title link ERROR
missing superclass link INFO
misused obsolete label link ERROR
multiple definitions link ERROR
multiple equivalent classes link ERROR
multiple labels link ERROR

Files

File Description
src/edamverify.py edamverify utility
queries/ Queries in SPARQL query language and SHACL constraint language format
docs/ Query documentation (the problem detected by the query and its remedy)
reports/ Reports from running edamverify on EDAM_dev.owl
README.md This file

About

Next-generation EDAM quality-control utility

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.7%
  • Python 1.3%