Skip to content

Commit

Permalink
feat(docs): Update content
Browse files Browse the repository at this point in the history
more on MDB patterns
  • Loading branch information
majensen committed Jun 28, 2023
1 parent 8e259d8 commit d22303d
Show file tree
Hide file tree
Showing 5 changed files with 47 additions and 5 deletions.
Binary file added python/docs/_static/mdb-patterns-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added python/docs/_static/mdb-patterns-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added python/docs/_static/mdb-patterns.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 41 additions & 4 deletions python/docs/mdb-conventions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,12 +99,17 @@ to unique graph nodes which must exist. (The notation below is based
on `Cypher <https://neo4j.com/docs/cypher-manual/current/>`_.)

* *Node*: For `(n:node)`, the combination `[n.model, n.handle]` is unique.
* That is, one and only one graph node exists with these values of `n.model` and `n.handle`.

* *Property*: For `(p:property)` with `(e)-[:has_property]->(p)`, the combination
* *Property (uniqueness)*: For `(p:property)` with `(e)-[:has_property]->(p)`, the combination
`[p.model, p.handle, e.handle]` is unique.

* One and only one graph node `p` exists satisfying this condition. `e` is a node or relationship, and `e.model == p.model` must hold.

* *Property (distinctness)*: For `(p:property)` with `(e)-[:has_property]->(p)` and `(q:property)` with `(f)-[:has_property]->(q)`, if `e != f`, then `p != q`.

* In other words, properties associated with different entities are always distinct; properties with the same handle must not be "reused" among different nodes or relationships, even in the same model. An implication of this requirement is that nodes or relationships form a namespace that distinguish their properties from others.

* *Relationship*: For `(r:relationship)` with `(s:node)<-[:has_src]-(r)-[:has_dst]->(d:node)`, the combination `[r.model, r.handle, s.handle, d.handle]` is unique.

* One and only one graph node `r` exists satisfying this condition, and `r.model == s.model == d.model` must hold.
Expand Down Expand Up @@ -133,11 +138,43 @@ Handles in combination with other properties can be unique. The model and handle

* Graph nodes which meet the conditions above can be thought of as playing a given semantic role in a specific context. They represent an interaction between a concept and a model.

In the MDB, the reuse of semantic concepts is expressed by linking all graph nodes playing the same semantic role to a common Concept node. Rather that creating a universal “demographic” node and connecting every model needing that concept to that node, each model that needs one gets its own “demographic” node.
"Reuse" of Semantic Roles in MDB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When creating a data model for a specific purpose, it is often desirable to make use of semantic concepts that have already been defined elsewhere. This is the case when the model needs to comply with a external standard, or when the system being modeled must interoperate with peers or within a larger system. Including an externally defined semantic element in a new data model is sometimes called *reuse*.

In an MDB, the reuse of semantic concepts *among different models* is expressed by linking all graph nodes playing the same semantic role to a common :ref:`Concept node <concepts>`. Rather than creating a universal “demographic” node and connecting every model needing that concept to that node, each model that needs one gets its own “demographic” node. The Concept node only acts as a "hub". A Term node can be used to annotate a Concept node with the details that point to an external standard (the origin or authority, the definition, and identifier).

This figure exemplifies the MDB pattern for representing reuse of an external semantic concept.

.. image:: _static/mdb-patterns.png
:align: center
:alt: Concept reuse in an MDB

Note that a Term node that annotates a Concept node is linked by a `:represents` relationship.

Terms themselves can also be components of Value Sets. Terms and Value Sets are explicitly intended to be reused among models within an MDB. A Term can represent an acceptable value, and Value Sets are hubs that aggregate Terms into an acceptable value list. The following figure indicates the graph patterns for reuse of both Terms and Value Sets in an MDB.

.. image:: _static/mdb-patterns-2.png
:align: center
:alt: Term and Value Set reuse in an MDB

Here, the two Properties `primary_site` and `anatomic_location` share a Value Set, while the Value Set for Property `sample_type` borrows the Term `blood`.

Encoding "Mappings"
^^^^^^^^^^^^^^^^^^^

An MDB is intended to store both models and inter-model relationships. An important example of such a relationship can be called *synonymy* - an assertion that two or more entities are semantically equivalent. In the context of data transformation, data values (Terms) valid under one model can be mapped to synonymous values in a different model. An MDB can store such mappings, and calls to an MDB can provide the backend to tools that perform transformations.

Assertions that terms are synonymous are made by experts or groups, who can differ in opinion. An MDB can also tag mappings according to the source or authority. This capability can, for example, drive a tool that performs transformation according to a specific authority's mappings.

The MDB pattern for asserting synonymy according to specific expert source is exemplified in this figure.

.. image:: _static/mdb-patterns-3.png
:align: Center
:alt: Synonym mappings represented in an MDB

The MDB pattern for reuse of semantic roles, whether entities from an existing model, or terms from an existing vocabulary, is as follows.

*WIP*

Models
^^^^^^
Expand Down
7 changes: 6 additions & 1 deletion python/docs/mdb-principles.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The MDB is a tool for managing active data. It can and should be anchored by ele

- *The MDB is intentionally designed to capture logical data models easily. Conceptual data models and abstract metamodels can indeed be captured and annotated, but this is secondary to the main use case of the MDB.*

In our (FNLCR/BIDS/CTOS) systems, the logical data model as captured in the MDB is very close to the physical representation of the data in our underlying native graph databases. This is intentional, in that it enables our databases to be directly configured by data SMEs, with little work necessary from engineering. However, the graph model underlying the MDB is a flexible abstraction that can capture the structure of RDBMS schemas, document-based data stores, UML representations, and other such artifacts.
In our (FNLCR/BACS/CTOS) systems, the logical data model as captured in the MDB is very close to the physical representation of the data in our underlying native graph databases. This is intentional, in that it enables our databases to be directly configured by data SMEs, with little work necessary from engineering. However, the graph model underlying the MDB is a flexible abstraction that can capture the structure of RDBMS schemas, document-based data stores, UML representations, and other such artifacts.

- *The MDB can represent semantic information and relationships among concepts, such as synonymy of terms. These features are designed primarily to facilitate pragmatic mapping between models from constituent systems that need to interoperate.*

Expand All @@ -45,6 +45,7 @@ ___________________________

A basic working principle is that a data model of almost any type can be rendered usefully as a `graph <https://en.wikipedia.org/wiki/Graph_database#Labeled-property_graph>`_, containing

.. _nodes_relationships_properties:
* *Nodes* - logical data groupings
* *Relationships* - logical or structural links or references between nodes, and
* *Properties* - Variables, columns, or slots for actual data items.
Expand All @@ -55,18 +56,21 @@ Other means of describing a data model may do so with more detail, or be more co

Data items are very frequently codes or strings chosen from a closed set of acceptable or valid values. In the MDB, the entity that represents a single such value is the Term.

.. _terms:
* *Terms* - entities which include a string representation (value) for a specific datum, and information on its source origin or authority.

Term entities may also include semantic information such as definitions and external identifiers that link back to their authoritative origin. For example, a Term adopted from the NCI Thesaurus would have an origin_id attribute set to its NCIt Concept Code.

Terms are associated with their Origin, but not directly with any Model. This is an intentional design decision that allows a model to build value sets by reusing terms from different sources, via the Value Set entity.

.. _value_sets:
* *Value Sets* - entities which aggregate Terms and so represent controlled vocabularies or acceptable value lists for Property values.

When Term entities are used to describe an acceptable value for a Property, they do so via a grouping entity called a Value Set. A given Term can be a part of any Value Set for any Model via the addition of a graph edge. Properties that accept data from a controlled vocabulary are linked to a Value Set entity, and Term entities that represent the acceptable values link to the Value Set.

Terms have an additional role in the MDB, to annotate Concept entities with semantic information.

.. _concepts:
* *Concepts* - entities which represent any abstract intellectual concept; a Concept's meaning is "induced" by Term entities that are linked to it via a "represents" graph edge.

The Concept entity is essentially a Term aggregation node, similar in function to a Value Set entity. It is an abstraction that enables the meanings of entities (not just Terms, but also Node, Relationships, and Properties) to be present in the database, and allows different models to reuse conceptual constructs and meanings defined by external authorities and elsewhere.
Expand All @@ -82,6 +86,7 @@ One might rather simply put that information directly into the Concept node --

Although the MDB is not primarily a knowledge base, it may be useful to record additional semantic information, especially for situations in which the mappings between model entities are not precisely synonymous, but reflect another kind of relationship. Mapping model entities to the `BRIDG <https://bridgmodel.nci.nih.gov/>`_ conceptual model, for example, is often characterized by a number of semantic "steps" beyond synonymy. For this purpose, the MDB defines a Predicate entity.

.. _predicates:
* *Predicates* - entities which represent a semantic relationship between two concepts, the "subject" and the "object".

A Predicate entity enables the formation of "triples" among Concept entities in the MDB. For example, the "generative" or "parent-child" relationship mentioned above can be represented by a Predicate entity linking parent and child concepts.
Expand Down

0 comments on commit d22303d

Please sign in to comment.