Skip to content

Commit

Permalink
docs(update): adding more yakyak
Browse files Browse the repository at this point in the history
  • Loading branch information
majensen committed Jul 7, 2023
1 parent d22303d commit 2146d8f
Show file tree
Hide file tree
Showing 12 changed files with 427 additions and 74 deletions.
34 changes: 0 additions & 34 deletions python/docs/classes.rst

This file was deleted.

4 changes: 2 additions & 2 deletions python/docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# -- Project information -----------------------------------------------------

project = u'bento-meta'
copyright = u'2020, FNLCR'
copyright = u'2020-2023, FNLCR'
author = u'Mark Jensen, Mark Benson, Nelson Moore'

# -- General configuration ---------------------------------------------------
Expand All @@ -27,7 +27,7 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'myst_nb',
# 'myst_nb', -- difficulties with this extension on mac, python 3.10
'autoapi.extension',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
Expand Down
13 changes: 0 additions & 13 deletions python/docs/index.md

This file was deleted.

15 changes: 8 additions & 7 deletions python/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@
contain the root `toctree` directive.
.. image:: _static/forkme_right_green_007200.svg
:align: right
:scale: 80%
:alt: Fork me on GitHub
:target: https://github.com/CBIIT/bento-meta
:align: right
:scale: 80%
:alt: Fork me on GitHub
:target: https://github.com/CBIIT/bento-meta

bento_meta and Metamodel Database (MDB)
=======================================

.. image:: https://travis-ci.org/CBIIT/bento-meta.svg?branch=master
:alt: Build Status
:target: https://travis-ci.org/CBIIT/bento-meta
:alt: Build Status
:target: https://travis-ci.org/CBIIT/bento-meta

**bento_meta** provides an object representation of
`property graph <https://en.wikipedia.org/wiki/Graph_database#Labeled-property_graph>`_
Expand Down Expand Up @@ -49,7 +49,7 @@ ____________

Run::

pip install https://github.com/CBIIT/bento-meta/raw/master/python/dist/bento-meta-0.0.2.tar.gz
pip install bento-meta


.. toctree::
Expand Down Expand Up @@ -113,3 +113,4 @@ Indices and tables
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

9 changes: 8 additions & 1 deletion python/docs/mdb-conventions.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
MDB Maintenance Principles and Conventions
MDB Conventions and Patterns
==========================================

Conventions and software tools based on the following principles and
Expand Down Expand Up @@ -138,6 +138,10 @@ Handles in combination with other properties can be unique. The model and handle

* Graph nodes which meet the conditions above can be thought of as playing a given semantic role in a specific context. They represent an interaction between a concept and a model.

Graph Patterns for Representation
_________________________________


"Reuse" of Semantic Roles in MDB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -155,6 +159,7 @@ Note that a Term node that annotates a Concept node is linked by a `:represents`

Terms themselves can also be components of Value Sets. Terms and Value Sets are explicitly intended to be reused among models within an MDB. A Term can represent an acceptable value, and Value Sets are hubs that aggregate Terms into an acceptable value list. The following figure indicates the graph patterns for reuse of both Terms and Value Sets in an MDB.

.. _term_valueset_pattern:
.. image:: _static/mdb-patterns-2.png
:align: center
:alt: Term and Value Set reuse in an MDB
Expand Down Expand Up @@ -201,6 +206,8 @@ must be made clear in the data (i.e., the model description file)
itself. There also should be a way to back out of at least one update
if necessary.

In our system, we use the `Liquibase <https://www.liquibase.com/>`_ platform, along with the Neo4j `Liquibase plugin <https://neo4j.com/labs/liquibase/docs>`_, to maintain auditability and reversibility of changes. This is facilitated by the Python package `liquichange <https://github.com/nelsonwmoore/liquichange>`_. More details can be found at the `bento-mdb <https://github.com/CBIIT/bento-mdb>`_ repository.

Terms
^^^^^

Expand Down
21 changes: 8 additions & 13 deletions python/docs/mdb-principles.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Metamodel Database (MDB) Principles
MDB Motivation and Rationale
____________________________

The MDB schema is intended to be simple in structure, without a profusion of different classes for specialized entities. At the same time, enough entities are provided to enable a separation between an entity and its sematic meaning as represented in the MDB.
The MDB schema is intended to be simple in structure, without a profusion of different classes for specialized entities. At the same time, enough entities are provided to enable a separation between an entity and its semantic meaning as represented in the MDB.
The goals are:

* To be able to store multiple models, developed independently for specific practical uses, to exist separately in a single data store. and also
Expand All @@ -13,7 +13,7 @@ The goals are:

To the extent the MDB succeeds in meeting these goals, it also yields useful mappings of terminology and structures between models. This feature is intended to facilitate data transformations that contribute to interoperability between projects or programs that might participate in an MDB.

A key aspect of the MDB, one that distinguishes it from systems that serve similar purposes, is that it is meant to be responsive and dynamic -- easily and perhaps frequently changed and updated. An MDB requires curation and quality management, but it is not devised to be a standard reference or a database of record. Its value is increased by incorporating stable entities from external such references (e.g., the `NCIt<https://ncit.nci.nih.gov/ncitbrowser/>`_), but it is designed as a tool to assist data SMEs who are managing new or rapidly changing data resources, characterized by frequent data augmentation, addition of new data sources, or modification of data models or structures, often based on scientific considerations or policy decisions.
A key aspect of the MDB, one that distinguishes it from systems that serve similar purposes, is that it is meant to be responsive and dynamic -- easily and perhaps frequently changed and updated. An MDB requires curation and quality management, but it is not devised to be a standard reference or a database of record. Its value is increased by incorporating stable entities from external such references (e.g., the `NCIt <https://ncit.nci.nih.gov/ncitbrowser/>`_), but it is designed as a tool to assist data SMEs who are managing new or rapidly changing data resources, characterized by frequent data augmentation, addition of new data sources, or modification of data models or structures, often based on scientific considerations or policy decisions.

MDB Design Decisions
____________________
Expand All @@ -22,11 +22,13 @@ This milieu in which the MDB operates leads to the following design decisions an

- *Creating, updating, and reading information from the MDB must be easy and intuitive for data management subject matter experts (SME).*

Software to perform these functions must support the SME users in this regard. For examgple, an SME needs robust and straightforward tools to manipulate the MDB that make the underlying database structure and conventions transparent. The SME should be able to change or update what is necessary in the database, without worrying about whether she will "break it" by doing so.
Software to perform these functions must support the SME users in this regard. For example, an SME needs robust and straightforward tools to manipulate the MDB that make the underlying database structure and conventions transparent. The SME should be able to change or update what is necessary in the database, without worrying about whether she will "break it" by doing so.

- *It is more important that the MDB describes the current data models (and so the current data) in the ecosystem accurately, than it is that the MDB is "complete" in other respects.*

The MDB is a tool for managing active data. It can and should be anchored by elements of standards, and should by virtue of its capacity as a management tool reflect changes to those standards. However, the goal is not to incorporate a complete rendition of any standard, but to track the current modeling of current data in the managed ecosystem. Practically, this may mean that semantic annotation of model lags behind the development *and use* of the model themselves. A value set of acceptable terms as recorded in an MDB may not immediately cover the entire value set defined by an external standard, yet the MDB is still useful for data validation in this state, and can be quickly updated to include valid terms as gaps are encountered.
The MDB is a tool for managing active data. It can and should be anchored by elements of standards, and should by virtue of its capacity as a management tool reflect changes to those standards. However, the goal is not to incorporate a complete rendition of any standard, but to track the current modeling of current data in the managed ecosystem.

Practically, this may mean that semantic annotation of model lags behind the development *and use* of the model themselves. A value set of acceptable terms as recorded in an MDB may not immediately cover the entire value set defined by an external standard, yet the MDB is still useful for data validation in this state, and can be quickly updated to include valid terms as gaps are encountered.

- *The MDB is intentionally designed to capture logical data models easily. Conceptual data models and abstract metamodels can indeed be captured and annotated, but this is secondary to the main use case of the MDB.*

Expand Down Expand Up @@ -66,7 +68,7 @@ Terms are associated with their Origin, but not directly with any Model. This is
.. _value_sets:
* *Value Sets* - entities which aggregate Terms and so represent controlled vocabularies or acceptable value lists for Property values.

When Term entities are used to describe an acceptable value for a Property, they do so via a grouping entity called a Value Set. A given Term can be a part of any Value Set for any Model via the addition of a graph edge. Properties that accept data from a controlled vocabulary are linked to a Value Set entity, and Term entities that represent the acceptable values link to the Value Set.
When Term entities are used to describe an acceptable value for a Property, they do so via a grouping entity called a Value Set. A given Term can be a part of any Value Set for any Model via the addition of a graph edge. Properties that accept data from a controlled vocabulary are linked to a Value Set entity, and Term entities that represent the acceptable values link to the Value Set. See :ref:`this figure <term_valueset_pattern>`.

Terms have an additional role in the MDB, to annotate Concept entities with semantic information.

Expand All @@ -82,7 +84,7 @@ The Concept node itself, as a database entity, does not describe the concept. In

Continuing with the example: "Diagnosis" is an intellectual concept that is defined, among other places, at the NCI Thesaurus, where its concept code is C15220. In the formalism of the MDB, a Term entity, containing the ``value`` "Diagnosis", the ``origin_name`` "NCIt", and the ``origin_code`` C15220, would link to the Concept through a ``represents`` graph edge.

One might rather simply put that information directly into the Concept node -- this is not disallowed. However, by using the Concept-Term indirection, one can also very simply add other Terms that describe synonyous concepts coming from other external authorities. Another Term, with ``value`` ``SDTM-MHEDTTYP`` and ``origin_name`` CDISC, could be created and linked to the Concept node. This single _addition_ to the MDB graph then captures the idea that the two notions of diagnosis are synonymous. Further, models that agree with each other with respect to NCIt could be translated into `CDISC <https://www.cdisc.org/>`_ representations with straightforward graph database queries. Because this update adds to the graph and does not change its previous structure, existing queries or interpretations that rely on the MDB are not affected.
One might rather simply put that information directly into the Concept node -- this is not disallowed. However, by using the Concept-Term indirection, one can also very simply add other Terms that describe synonyous concepts coming from other external authorities. Another Term, with ``value`` ``SDTM-MHEDTTYP`` and ``origin_name`` CDISC, could be created and linked to the Concept node. This single *addition* to the MDB graph then captures the idea that the two notions of diagnosis are synonymous. Further, models that agree with each other with respect to NCIt could be translated into `CDISC <https://www.cdisc.org/>`_ representations with straightforward graph database queries. Because this update adds to the graph and does not change its previous structure, existing queries or interpretations that rely on the MDB are not affected.

Although the MDB is not primarily a knowledge base, it may be useful to record additional semantic information, especially for situations in which the mappings between model entities are not precisely synonymous, but reflect another kind of relationship. Mapping model entities to the `BRIDG <https://bridgmodel.nci.nih.gov/>`_ conceptual model, for example, is often characterized by a number of semantic "steps" beyond synonymy. For this purpose, the MDB defines a Predicate entity.

Expand Down Expand Up @@ -117,10 +119,3 @@ In the MDB, the way to connect the Term with the Node (or other) entity is indir
This seems cumbersome, and it may be, but with appropriate APIs to the database, an SME or engineer will not need to think about it. One benefit of the approach, however, is that one can query the MDB for semantic mapping completely independently of any models stored there. All Terms that attach (via ``represent``) to a given Concept entity are considered to be synoymous, in the working context of the MDB. New models with semantically identical Nodes can be mapped into existing terminology (and therefore, existing mappings and translations) by a single association of the new model's entities to the correct Concept entities. This is a curation step that can be performed separately from creating the model structure itself.

If there is a distinction of meaning between two nodes with a similar structural role in two models (say "veterinary diagnosis" and "clinical diagnosis"), this can also be handled by addition to the MDB, without structurally changing it. In this case, creating a new Concept entity to attach to a Node ``veterinary_diagnosis``, and linking that Concept to the "Clinical Diagnosis" Concept with a Predicate ``is_related_to``, may suffice for the practical purposes of mapping between models. If a Term (again, an external stable semantic entity) that represents the idea of "veterinary diagnosis" is found, that can be added to the new Concept in the MDB later.





[GDC examples of confusion because these distinctions are blurred.]

4 changes: 4 additions & 0 deletions python/docs/the_object_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,8 @@ Subclass that models a property of a node or relationship (edge). Posesses all :
Subclass that models a term from a terminology. Posesses all :class:`Entity` attributes, plus the following:

.. py:attribute:: term.handle
:type: simple
.. py:attribute:: term.value
:type: simple
.. py:attribute:: term.nanoid
Expand Down Expand Up @@ -338,6 +340,8 @@ Subclass that models a semantic concept. Posesses all :class:`Entity` attributes
Subclass that models a semantic link between concepts. Posesses all :class:`Entity` attributes, plus the following:

.. py:attribute:: predicate.handle
:type: simple
.. py:attribute:: predicate.subject
:type: Concept
.. py:attribute:: predicate.object
Expand Down
Loading

0 comments on commit 2146d8f

Please sign in to comment.