-
Notifications
You must be signed in to change notification settings - Fork 6
Graph Validation
SemTK provides several ways to validate the contents of a knowledge graph:
- Validation during ingestion
- Validation against OWL cardinality restrictions
- Validation using SHACL Shapes Constraint Language
The SemTK ingestion process performs basic data validation before loading data to a graph. It checks that the data to be loaded conforms to the model (excluding cardinality requirements). Additional ingestion-time data validation checks can be configured as well.
SemTK provides capabilities for checking if data conforms to the OWL cardinality restrictions found in its ontology.
A sample cardinality restriction is found in the final line of SADL below:
FruitBasket is a type of Thing,
described by includes with values of type Fruit.
includes of FruitBasket has at most 3 values.
The SemTK Ontology Info Service (endpoint ontologyinfo/getCardinalityViolations
) and its clients provide access to cardinality restriction violations.
To browse cardinality restrictions in SPARQLgraph, use the Explore tab in "Restrictions" mode. The example below shows a fruit basket that includes 4 fruits, exceeding the maximum of 3.
In the UI above, "violations" are cases where the actual number exceeds a maximum cardinality restriction (e.g. 4 fruits exceeding the limit of 3 fruits per basket). In contrast, "incomplete data" refers to cases where the actual number is less than a minimum cardinality requirement (e.g. the model specifies that an Address class has a recipient property, but data contains an Address instance with no recipient)
SHACL Shapes Constraint Language is a W3C-recommended language for validating RDF graphs against a set of conditions ("shapes").
The following is a sample SHACL shape (further examples can be seen at DeliveryBasketExample-shacl.ttl and RACK-shacl.ttl)
### A FruitBasket must include between 1 and 3 fruits
### A FruitBasket expiration date must be later than pack date
dbex:FruitBasketConforms
a sh:NodeShape;
sh:targetClass dbex:FruitBasket;
sh:property [
sh:path dbex:includes;
sh:minCount 1;
sh:maxCount 3;
];
sh:property [
sh:path dbex:packDate;
sh:lessThan dbex:expirationDate;
];
.
The SemTK Utility Service utility/getShaclResults
endpoint validates a SPARQL connection against a set of SHACL shapes.
To browse SHACL results in SPARQLgraph, use the Explore tab in "SHACL Validation" mode. The example below shows a fruit basket that has an expiration date preceding its pack date, violating the SHACL shape above.
Tips for writing SHACL shapes:
- May define
sh:message
for a shape. Ifsh:message
is not present, the SHACL processor will generate a message. For some constraint types (e.g.sh:minCount
,sh:maxCount
,sh:minLength
,sh:maxLength
), the generated message may be more informative than a custom message. For example, the generated message forsh:maxCount
includes the actual instance count found. Likewise, the generated message forsh:maxLength
includes the offending string. - The SHACL specification includes
sh:description
(for Property Shapes only), but SemTK does not include these in its SHACL output as they seem to not be accessible via the Jena SHACL Java API. Please usesh:message
. - When specifying constraints that take a shape as input (e.g.
sh:node
), may define the shape either inline (it will become a blank node) or as a named shape defined elsewhere. The latter option provides a chance to give it a descriptive name, which may result in a more understandable violation message.