ELIXIR biovalidator is a JSON Schema validator extended from popular javascript library AJV. In addition to standard JSON Schema validation, the biovalidator covers many validation use cases related life sciences, including ontology validation and taxonomy validation. Furthermore, the biovalidator is capable of running as a server or in CLI mode.
The biovalidator currently supports JSON Schema draft-06/07/2019-09.
- graphRestrictions
graph_restriction
renamed tographRestriction
to be consistent with other keywords- Remove unused
relations
keyword insidegraphRestrictions
- Remove unused
direct
keyword insidegraphRestrictions
- Rename
include_self
toincludeSelf
keyword insidegraphRestrictions
to be consistent with camel case naming convention
- Merged
validator-cli.js
withsrc/server.js
. Now one entry point to the application:src/biovalidator.js
- Changes to arguments accepted at the startup
--json
renamed to--data
- Added
--ref
,--port
,--baseUrl
,pidPath
- Support for new keyword
isValidIdentifier
. Validate accessions/IDs using identifiers.org API. - Add
queryFields
keyword insidegraphRestrictions
to query for either obo_id or label - Add caching library improve memory consumption and auto cache evictions
- Fix a bug related to OLS API call in graphRestrictions
- Getting Started
- Using biovalidator as a server
- Using biovalidator as a CLI command
- Startup arguments
- Extended keywords for ontology and taxonomy validation
- Running in Docker
- License
- Node.js - v10.19.1.0+
- npm You can also run the biovalidator using docker without installing node or npm.
- Install Node.js: https://nodejs.org/en/ (v10.19.1.0+)
- Check node and npm version
node -v
npm -v
- Clone project and install dependencies:
git clone https://github.com/elixir-europe/biovalidator.git
cd biovalidator
npm install
- Run test cases to see everything is in order
npm test
By default, biovalidator will start as a server. Read startup arguments section for more server options.
node src/biovalidator
Once the server is up and running it can be accessed in your browser at http://localhost:3020/.
The biovalidator also exposes an endpoint for validation: http://localhost:3020/validate.
The /validate
POST endpoint accepts JSON as data and has the following structure.
{
"schema": {},
"data": {}
}
- schema: JSON Schema to validate the data
- data: data to be validated using given JSON Schema
Make sure to add content-type header if there are any problems using the API.
Content-Type: application/json
Example: Sending a POST request with the following body:
{
"schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"alias": {
"description": "A sample unique identifier in a submission.",
"type": "string"
},
"taxonId": {
"description": "The taxonomy id for the sample species.",
"type": "integer"
},
"taxon": {
"description": "The taxonomy name for the sample species.",
"type": "string"
},
"releaseDate": {
"description": "Date from which this sample is released publicly.",
"type": "string",
"format": "date"
}
},
"required": ["alias", "taxonId" ]
},
"data": {
"alias": "MA456",
"taxonId": 9606
}
}
will produce a response like:
HTTP status code 200
[]
An example of a validation response with errors:
HTTP status code 200
[
{
"errors": [
"must have required property 'value'"
],
"dataPath": ".attributes['age'][0].value"
},
{
"errors": [
"should NOT be shorter than 1 characters",
"must match format \"uri\""
],
"dataPath": ".attributes['breed'][0].terms[0].url"
}
]
Where errors is an array of error messages for a given input identified by its path on dataPath. There may be one or more error objects within the response array. An empty array represents a valid validation result.
By default, biovalidator will log to the console and ./log
directory. Log files are daily rotated.
You can change the default logging directory by specifying an environment variable BIOVALIDATOR_LOG_DIR
.
Example in linux environment:
export BIOVALIDATOR_LOG_DIR=./new_log_dir
In server mode, biovalidator caches referenced schema to minimise network time for repeated schema lookups.
/cache
GET
and DELETE
endpoints can be used to retrieve and clear cached schema.
Please note that these endpoints are not protected and anyone with access can use them.
- GET /cache
- DELETE /cache
The biovalidator can also be run as a CLI application. If you provide --schema
and --data
as parameters to the application, it will execute in CLI mode.
To see all the available options, run node ./src/biovalidator --help
$ node ./src/biovalidator --help
ELIXIR biovalidator: JSON Schema validator with ontology extension
usage: node ./src/biovalidator.js [--schema=path/to/schema.json]
[--data=path/to/data.json] [--ref=path/to/ref/dir]
Options:
--help Show help [boolean]
--version Show version number [boolean]
--baseUrl base URL for the server. Only valid in server mode.
--pidPath PID file name and path. Only valid in server mode.
-s, --schema path to the schema file.
-d, --data path to the data file.
-r, --ref path to referenced schema directory/file/glob pattern.
-p, --port exposed port in server mode. Only valid in server mode.
Examples:
node ./src/biovalidator.js Runs in CLI mode to validate
--data=test_data.json 'test_data.json' with
--schema=test_schema.json 'test_schema.json'
--ref
: If you have a set of local schemas that will be used as$ref
in your validating schema, these can be passed to biovalidator using--ref
argument. The--ref
argument can be used in both server and CLI mode.--ref
accepts file path, directory and glob patterns as values. When parsing glob patterns, it is better to wrap with'
to avoid parsing them by command line.
node src/biovalidator --ref=/path/to/reference/schema/dir/*.json
node src/biovalidator --ref '/path/to/reference/schema/dir/*.json'
--port
: By default server will run on port 3020. To change the exposed port--port
can be provided as an argument. Only works in server mode.
node src/biovalidator --port=8080
--baseUrl
: Base URL can be provided as an argument to change the URL of the server. Only works in server mode.
node src/biovalidator --baseUrl=/schema # will serve the content under http://localhost:3020/schema
--pidPath
: Path to the PID file. Application will run the PID to the given file. The default is./server.pid
. Only works in server mode. Also note that, this is the path to the file and not the directory it will be written to.
node src/biovalidator --pidPath=/pid/file/path/server.pid
--logDir
This should be added as an environment variable. Can be provided to specify the directory of the log files. Log files will be rotated every 24 hours. Only works in server mode.
node src/biovalidator --logDir=/log/directory/path
- BIOVALIDATOR_LOG_DIR
- BIOVALIDATOR_PORT
- BIOVALIDATOR_BASE_URL
- BIOVALIDATOR_PID_PATH
Example:
export BIOVALIDATOR_LOG_DIR=./new_log_dir
export BIOVALIDATOR_PORT=3020
node src/biovalidator
The biovalidator supports four extended keywords for ontology and taxonomy validation: graphRestriction
, isChildTermOf
, isValidTerm
and isValidTaxonomy
.
This custom keyword evaluates if an ontology term is child of another. This keyword is applied to a string (CURIE) and passes validation if the term is a child of the term defined in the schema. The keyword requires one or more parent terms (classes) and ontology ids (ontologies), both of which should exist in OLS - Ontology Lookup Service.
- ontologies should be present in EBI OLS and are case-sensitive (most of the OLS ontologies are in lower case)
This keyword works by doing an asynchronous call to the OLS API that will respond with the required information to know if a given term is child of another.
Being an async validation step, whenever used in a schema, the schema must have the flag: "$async": true
in its object root.
Schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://schema.dev.data.humancellatlas.org/module/ontology/5.3.0/organ_ontology",
"$async": true,
"properties": {
"ontology": {
"description": "A term from the ontology [UBERON](https://www.ebi.ac.uk/ols/ontologies/uberon) for an organ or a cellular bodily fluid such as blood or lymph.",
"type": "string",
"graphRestriction": {
"ontologies" : ["obo:hcao", "obo:uberon"],
"classes": ["UBERON:0000062","UBERON:0000179"],
"includeSelf": false
}
}
}
}
Data:
{
"ontology": "UBERON:0000955"
}
This custom keyword also evaluates if an ontology term is child of another and is a simplified version of the graphRestriction keyword. This keyword is applied to a string (url) and passes validation if the term is a child of the term defined in the schema. The keyword requires the parent term and the ontology id, both of which should exist in OLS - Ontology Lookup Service.
This keyword works by doing an asynchronous call to the OLS API that will respond with the required information to know if a given term is child of another.
Being an async validation step, whenever used in a schema, the schema must have the flag: "$async": true
in its object root.
Schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$async": true,
"properties": {
"term": {
"type": "string",
"format": "uri",
"isChildTermOf": {
"parentTerm": "http://purl.obolibrary.org/obo/PATO_0000047",
"ontologyId": "pato"
}
}
}
}
Data:
{
"term": "http://purl.obolibrary.org/obo/PATO_0000383"
}
This custom keyword evaluates if a given ontology term url exists in OLS (Ontology Lookup Service). It is applied to a string (url) and passes validation if the term exists in OLS. It can be applied to any string defined in the schema.
This keyword works by doing an asynchronous call to the OLS API that will respond with the required information to determine if the term exists in OLS or not.
Being an async validation step, whenever used in a schema, the schema must have the flag: "$async": true
in its object root.
Schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$async": true,
"properties": {
"url": {
"type": "string",
"format": "uri",
"isValidTerm": true
}
}
}
Data:
{
"url": "http://purl.obolibrary.org/obo/PATO_0000383"
}
This custom keyword evaluates if a given taxonomy exists in ENA's Taxonomy Browser. It is applied to a string (url) and passes validation if the taxonomy exists in ENA. It can be applied to any string defined in the schema.
This keyword works by doing an asynchronous call to the ENA API that will respond with the required information to determine if the term exists or not.
Being an async validation step, whenever used in a schema, the schema must have the flag: "$async": true
in its object root.
Schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Is valid taxonomy expression.",
"$async": true,
"properties": {
"value": {
"type": "string",
"minLength": 1,
"isValidTaxonomy": true
}
}
}
Data:
{
"metagenomic source" : [ {
"value" : "wastewater metagenome"
} ]
}
Evaluates if a given identifier has a correct format using identifiers.org resolution API. The keyword is applicable to the string
data type.
The keyword will do an asynchronous call to the identifier.org API to resolve the URL for the given CURIE.
Being an async validation step, whenever used in a schema, the schema must have the flag: "$async": true
in its object root.
The keyword has two properties: prefixes
and prefix
. Only one of them is allowed in a block and prefix
will take the priority in case both are provided.
prefix
define one namespace/prefix for the expected identifier/accession. In the data, field should only contain the ID/accession without the namespace.prefixes
define a set of allowed namespaces/prefixes. In the data, field should contain a valid CURIE (namespace:id format)
Schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$async": true,
"properties": {
"SampleId": {
"type": "string",
"isValidIdentifier": {
"prefix": "biosample"
}
}
}
}
Data:
{
"SampleId": "SAMEA2397676"
}
Schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$async": true,
"properties": {
"resourceId": {
"type": "string",
"isValidIdentifier": {
"prefixes": ["biosample", "arrayexpress"]
}
}
}
}
Data:
{
"resourceId": "biosample:SAMEA2397676"
}
A Dockerized version of biovalidator is available on quay.io. This image can be used to run the validator without cloning this repository.
Pull docker image from quay.io
docker pull quay.io/ebi-ait/biovalidator:2.2.2
Run in server mode
docker run -p 3020:3020 -d quay.io/ebi-ait/biovalidator:2.2.2
Run in onetime CLI mode
docker run quay.io/ebi-ait/biovalidator:2.2.2 --schema /path/to/schema.json --data /path/to/data.json
For development purposes using nodemon is useful. It reloads the application every time something has changed on save time.
nodemon src/biovalidator
For more details about licensing see the LICENSE.