diff --git a/README.md b/README.md index bea848f0a..e128226ef 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,25 @@ # ![TRAC Data & Analytics Platform](doc/_images/tracdap_horizontal_400.png) -*A next-generation data and analytics platform for use in highly regulated environments* +*The modern model platform for complex, critical models and calculations.* [![FINOS - Incubating](https://cdn.jsdelivr.net/gh/finos/contrib-toolbox@master/images/badge-incubating.svg)](https://finosfoundation.atlassian.net/wiki/display/FINOS/Incubating) -TRAC D.A.P. brings a step change in performance, insight, flexibility and control -compared to conventional analytics platforms. By redrawing the boundary -between business and technology, modellers and business users are given easy -access to modern, open source tools that can execute at scale, while technology -integrations and operational concerns are cleanly separated and consolidated -across use cases. +TRAC is a universal model orchestration solution which is designed for the most complex, critical +and highly-governed use cases. It combines your existing data and compute infrastructure, +model development environments and the repository of versioned code, to create a single ecosystem +in which to build and deploy models, orchestrate complex workflows and run analytics. -At the core of a platform, a flexible metadata model allows data and models to -be catalogued, plugged together and shared across the business. Using the -principal of immutability, TRAC allows new data structures and model pipelines -to be created, updated and executed at any time without change risk to production -workflows, guaranteeing total repeatability, audit and control (TRAC). +TRAC is designed to break the trade-off that has traditionally been required, between flexible +(but uncontrolled) analytics solutions and highly controlled (but inflexible) production model +platforms. It offers best of both worlds, power, control and analytical flexibility. +The core platform services - i.e. TRAC Data & Analytics Platform (or TRAC D.A.P.) - are maintained by +`finTRAC Limited `_ in association with the `finos Foundation `_ +under the `Apache Software License version 2.0 `_. ## Documentation and Packages -Documentation for the TRAC platform is available on our website at -[tracdap.finos.org](https://tracdap.finos.org). +Documentation for the TRAC D.A.P platform is available on our website at [tracdap.finos.org](https://tracdap.finos.org). The following packages are available: @@ -31,6 +29,7 @@ The following packages are available: | [Web API package](https://www.npmjs.com/package/@finos/tracdap-web-api) | Build client apps in JavaScript or TypeScript using the TRAC platform APIs | | [Platform releases](https://github.com/finos/tracdap/releases) | Packages for the platform services and a standalone sandbox are published with each release on GitHub | +Commercially supported deployments of TRAC are separately available from `finTRAC Limited `_. ## Development Status diff --git a/doc/_images/icon_audit.png b/doc/_images/icon_audit.png new file mode 100644 index 000000000..5c290014d Binary files /dev/null and b/doc/_images/icon_audit.png differ diff --git a/doc/_images/icon_corrupt.png b/doc/_images/icon_corrupt.png new file mode 100644 index 000000000..9e9477158 Binary files /dev/null and b/doc/_images/icon_corrupt.png differ diff --git a/doc/_images/icon_data.png b/doc/_images/icon_data.png new file mode 100644 index 000000000..843e0960a Binary files /dev/null and b/doc/_images/icon_data.png differ diff --git a/doc/_images/icon_flow.png b/doc/_images/icon_flow.png new file mode 100644 index 000000000..8739bf7b6 Binary files /dev/null and b/doc/_images/icon_flow.png differ diff --git a/doc/_images/icon_job.png b/doc/_images/icon_job.png new file mode 100644 index 000000000..2163b9c57 Binary files /dev/null and b/doc/_images/icon_job.png differ diff --git a/doc/_images/icon_model.png b/doc/_images/icon_model.png new file mode 100644 index 000000000..ef9b85a1a Binary files /dev/null and b/doc/_images/icon_model.png differ diff --git a/doc/_images/icon_persist.png b/doc/_images/icon_persist.png new file mode 100644 index 000000000..25d459086 Binary files /dev/null and b/doc/_images/icon_persist.png differ diff --git a/doc/_images/icon_repeat.png b/doc/_images/icon_repeat.png new file mode 100644 index 000000000..11e6ce941 Binary files /dev/null and b/doc/_images/icon_repeat.png differ diff --git a/doc/_images/icon_self_doc.png b/doc/_images/icon_self_doc.png new file mode 100644 index 000000000..11f111227 Binary files /dev/null and b/doc/_images/icon_self_doc.png differ diff --git a/doc/_images/icon_sufficient.png b/doc/_images/icon_sufficient.png new file mode 100644 index 000000000..3750ba3a0 Binary files /dev/null and b/doc/_images/icon_sufficient.png differ diff --git a/doc/conf.py b/doc/conf.py index 2dbad3d0b..f09a12e0d 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -87,7 +87,7 @@ def setup(app): "sphinx_wagtail_theme" ] - +exclusions = ['unused'] # Auto API configuration diff --git a/doc/index.rst b/doc/index.rst index ef444d685..f463de4f1 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -1,18 +1,22 @@ ############################## -TRAC Data & Analytics Platform +TRAC: The Modern Model Platform ############################## -.. centered:: - *A next-generation data and analytics platform for use in highly regulated environments* - +TRAC is a universal model orchestration solution designed for the most complex, critical and highly-governed +use cases. -.. note:: - We are building the documentation for TRAC in parallel with the open source version of the - platform, both are in active deveopment. This documentaiton is presented in the hope that - it will be useful before it is complete! +The core platform services - i.e. TRAC Data & Analytics Platform (or TRAC D.A.P.) - are maintained by +`finTRAC Limited `_ in association with the `finos Foundation `_ +under the `Apache Software License version 2.0 `_. + +This documentation site focuses on how to deploy the TRAC D.A.P. services and build both models and +applications which leverage those services. Commercially supported deployments of TRAC are separately +available from `finTRAC Limited `_. - You can see the current development status and roadmap for the platform on the + +.. note:: + You can see the current development status of TRAC D.A.P. and a roadmap for the platform on the `roadmap page `_. If you have particular questions or issues, please raise a ticket on our `issue tracker `_. @@ -26,7 +30,7 @@ TRAC Data & Analytics Platform **Learn about TRAC** ^^^^^^^^^^^^^^^^^^^^ - Learn about the TRAC platform, starting with the metadata model. + Learn about the TRAC, the metadata model, virtual deployment framework and the TRAC Guarantee. +++ .. button-ref:: overview/introduction @@ -42,7 +46,7 @@ TRAC Data & Analytics Platform **Build and run models** ^^^^^^^^^^^^^^^^^^^^^^^^ - Use the TRAC runtime APIs to build portable, self-documenting models. + Use the TRAC runtime APIs to build portable, self-describing models. +++ .. button-ref:: modelling/index @@ -74,10 +78,10 @@ TRAC Data & Analytics Platform :class-footer: sd-border-0 - **Deploy and manage the platform** + **Deploy and manage TRAC D.A.P.** ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Everything to do with deployment, configuration and technology integration, + Explore deployment, configuration and integration, for dev-ops engineers and systems administrators. +++ diff --git a/doc/overview/index.rst b/doc/overview/index.rst index 90c4055fb..92788aa3e 100644 --- a/doc/overview/index.rst +++ b/doc/overview/index.rst @@ -6,4 +6,3 @@ Platform Overview .. toctree:: ./introduction - ./metadata_model diff --git a/doc/overview/introduction.rst b/doc/overview/introduction.rst index e5d6ee1dd..e9c094605 100644 --- a/doc/overview/introduction.rst +++ b/doc/overview/introduction.rst @@ -1,155 +1,251 @@ +Introduction +==================== + +TRAC is a universal model orchestration solution that combines your existing data and compute infrastructure, +model development environments and the repository of versioned code, to create a single ecosystem in +which to build and deploy models, orchestrate complex workflows and run analytics. + +The platform is built around three key principles, selected to break the trade-off that has traditionally +been required, between flexible (but uncontrolled) analytics solutions and highly controlled (but +inflexible) production platforms. + +.. list-table:: + :widths: 30 40 200 + + * - |icon-sufficient| + - **SUFFICIENT** + - The same infrastructure, tools and business assets support both production and experimental model runs, and post-run analytics. TRAC therefore supports all possible uses of a model and no other deployment environments are required. + + * - |icon-corrupt| + - **INCORRUPTIBLE** + - The platform's design makes it impossible to accidentally damage or destroy deployed data, models or flows. Model developers and users can therefore self-serve with confidence, free from the constraints of traditional change control processes. + + * - |icon-self-doc| + - **SELF-DOCUMENTING** + - TRAC automatically generates governance-ready documentation with no manual input required, eliminating the need to manually compile paper evidence for model deployment oversight, data lineage reporting and internal audit. -Introduction to TRAC +Because TRAC is sufficient, incorruptible and self-documenting you get the best of both worlds. Maximal +control and transparency plus analytical flexibility, in a single solution. + +.. |icon-sufficient| image:: /_images/icon_sufficient.png + :width: 85px + :height: 85px + +.. |icon-corrupt| image:: /_images/icon_corrupt.png + :width: 85px + :height: 85px + +.. |icon-self-doc| image:: /_images/icon_self_doc.png + :width: 85px + :height: 85px + + + + +TRAC Metadata Model ==================== -TRAC is a new type of analytics solution designed to work with cloud and big data technologies -to solve the challenge of managing complex and highly governed models across their lifecycle -for multiple personas. +Structural Model +~~~~~~ -.. image:: ../_images/overview_personas.png - :align: center +TRAC is built around a structural metadata model which catalogues, describes and controls almost everything that happens on the platform. The model consists of two layers. -.. grid:: 1 2 2 2 - :gutter: 3 +.. list-table:: + :widths: 25 200 - .. grid-item-card:: - :class-header: sd-bg-light sd-pt-0 sd-pb-1 - :class-body: sd-py-0 - :shadow: md + * - **OBJECTS** + - Objects are the model’s structural elements. Data, models and jobs are all described by metadata objects. Each type of object has a metadata structure that is + defined as part of the TRAC API. - **What is TRAC?** - ^^^^^^^^^^^^^^^^^ + * - **TAGS** + - Tags are used to index, describe and control objects, they are made up of key-value attributes. + Some attributes are controlled by the platform, others can be set by client applications or + edited by users. - * A model management and orchestration solution which is; - * Built around a structural meta-data model - * Designed to manage complex, critical, highly governed models - * Open source and free of any licence costs - * Compatible with all major cloud providers and Hadoop +Primary Object Types +~~~~~~~~ +All model orchestration and analytics use cases can be understood in reference to four primary object types. They are not the +the only types of object on TRAC but they are the most common and important. - .. grid-item-card:: - :class-header: sd-bg-light sd-pt-0 sd-pb-1 - :class-body: sd-py-0 - :shadow: md +.. list-table:: + :widths: 25 25 65 100 + :header-rows: 1 + + * - OBJECT + - TYPE + - EXTERNAL REFERENCE + - OBJECT COMPONENTS + * - |icon-data| + - **DATA** + - Collections of documents and records which have been imported into a TRAC-controlled Data Store + - A structural representation of the data schema plus information about its physical storage + * - |icon-model| + - **MODEL** + - Discrete units of code stored in a repository, which are exposed to TRAC via the model import process + - A structural representation of the model schema plus reference to immutable model code or a binary package (e.g. in Git or Nexus) + * - |icon-flow| + - **FLOW** + - NA - flows exists only as TRAC metadata objects + - A calculation graph where inputs, outputs and models are the nodes and edges represent data flow + * - |icon-job| + - **JOB** + - A process or calculation orchestrated by TRAC which may use one or more external system or resource. There are five job types; ImportModel, ImportData, RunModel, RunFlow and ExportData + - The detail varies by job type but it will map the job reference to the objects used as inputs and those generated as outputs. For a RunFlow job this includes the flow plus models, data and parameters used as inputs and the datasets the job generated. + + + +.. |icon-data| image:: /_images/icon_data.png + :width: 85px + :height: 85px + +.. |icon-model| image:: /_images/icon_model.png + :width: 85px + :height: 85px + +.. |icon-flow| image:: /_images/icon_flow.png + :width: 85px + :height: 85px + +.. |icon-job| image:: /_images/icon_job.png + :width: 85px + :height: 85px + + +Versioning +~~~~~~~~ + +Metadata records are maintained using an immutable, time-indexed version history, with "updates" being +performed by creating a new version of the object or tag. TRAC metadata therefore provides a fully +consistent historical view of the platform and a complete audit history that is both machine and human +readable. + +Where metadata objects refer to external resources such as models and data, those resources are +also immutable. This is achieved using e.g. GitHub tags or Nexus binary versions for models, and data +areas owned by TRAC with controlled write access for primary data. + + +Virtual Deployment Framework +==================== + +Self-describing Models +~~~~~~~~ +Models can be imported and used with zero code modifications or platform-level interventions, so long as +the model code contains a custom function which declares the model's schema to the platform. A model schema +consists of: + +* The schema of any data inputs the model needs to run - **What is different about TRAC?** - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +* The schema of any optional or required parameters which affect how the model runs - * Integrates the model inventory, code repository and model execution - * Systemizes the model execution process, including overlays and sign-off - * Comprehensive version history and audit trail (BCBS239, SoX) - * Perfect roll-back and repeatability of any historical calculation - * Easy to configure what-if and champion-challenger analysis - * Flexible system for policy controls and model risk reporting +* The schema of the output data which the model produces when it runs -Structured meta-data model --------------------------- +Model Deployment Process +~~~~~~~~ + +TRAC uses a 'virtual' model deployment framework, in which model code remains in an external repository +and is accessed at runtime. There are three main processes involved in this framework and TRAC performs +validations at each of the steps. These validations replace the traditional route-to-live process and +allow models to be deployed and used without platform-level interventions or code changes. + +.. list-table:: + :widths: 35 40 140 70 + :header-rows: 1 + + * - OBJECT + - PROCESS + - SUMMARY + - RTL VALIDATION -TRAC is built around a structural metadata model which, combined with the immutability of -the underlying objects, gives users a full version history, lineage, roll-back and repeatability. + * - |icon-model| + - **IMPORT MODELS** + - Importing a model creates an object in the TRAC metadata store which refers to and describes the model. This record includes the model schema. The model is not deployed (in the traditional, physical sense) because the code remains in the repository. + - Does the model code contain a properly constructed function declaring its schema? -.. image:: ../_images/overview_metadata.png - :align: center + * - |icon-flow| + - **BUILD FLOW** + - Flows can be built and validated on the platform using only the schema representations of the models. Flows exist only as metadata objects, so a flow is like a ‘virtual’ deployment of some models into an execution process. + - Is the model schema compatible with it's proposed placing in the calculation graph? + * - |icon-job| + - **RUN JOBS** + - For a RunFlow job you first pick a flow and the. select the data and model objects to use for each node, plus any required parameters. TRAC then fetches the model code and the data records from storage and orchestrates the calculations as a single job. + - Does the model code generate outputs which are consistent with the declared schema? -The 'No-IT' operating model ---------------------------- -TRAC empowers model developers and model users to self-serve within a controlled environment, -eliminating the need for a platform support team to manage configuration and deployments at -the application level. +In addition to these steps, the TRAC Runtime can be deployed to your IDE of choice, +giving you all the type safety of production and ensuring that models translate to production without +modification. Any model which executes via the TRAC Runtime service in the IDE with local data inputs +will run on the platform. + + +TRAC Guarantee +==================== + +TRAC offers a unique control environment which is characterised by three guarantees. + +.. list-table:: + :widths: 30 30 200 + + * - |icon-audit| + - **AUDITABLE ACTIONS** + - Any action that changes a tag or creates an object is recorded in a time-consistent fashion in the + metadata model. The metadata is designed to be easily understood by humans and machines and + standard report formats can be used to create governance-ready documentation with no manual input + required. + + * - |icon-repeat| + - **REPEATABLE JOBS** + - Any RunModel or RunFlow job can be re-resubmitted and because the inputs are immutable you will + get the same result, guaranteed. We account for multiple factors that cause non-deterministic + model output: threading (don't use it!), random number generation, time, external calls and + dynamic execution (these are disabled), language and library versions (these are recorded + with the metadata). + + * - |icon-persist| + - **RISK FREE PLATFORM** + - Every version of every object (model, data, flow) remains permanently available to use and there is + no possibility of accidental loss or damage to deployed assets. Therefore, there is no change risk + (as traditionally defined) on TRAC. + +.. |icon-audit| image:: /_images/icon_audit.png + :width: 85px + :height: 85px + +.. |icon-repeat| image:: /_images/icon_repeat.png + :width: 85px + :height: 85px + +.. |icon-persist| image:: /_images/icon_persist.png + :width: 85px + :height: 85px + +.. note:: + The repeatability guarantee applies to RunModel, RunFlow and ExportData jobs. A model cannot be + imported twice so an ImportModel job cannot be repeated. An ImportData job can be repeated but + due to the dependence on an external source, TRAC cannot guarantee that the same outputs will be produced. + + +Experimentation & Analytics +==================== +In addition to supporting highly-controlled (or 'production') model execution processes, TRAC also provide two main ways to +construct 'experimental' model runs. .. list-table:: - :widths: 30 70 200 - - * - |icon-zero-risk| - - **Zero-risk deployment** - - TRAC's built-in repeatability guarantee allows new models, data and overlays to - be loaded and executed against production data at any time, with zero change risk, - so the traditional separation between user and platform team is redundant. - - * - |icon-self-gen-ui| - - **Self-generating UI** - - Models and data loaded onto the platform are immediately available to configure and - run in the user interface. The meta-data associated with model objects describes their - parameters and inputs, so the UI can be generated dynamically. - - * - |icon-auto-doc| - - **Automated documentation** - - Model implementation, data lineage and audit documentation is automated from the TRAC - meta-data, eliminating the need for labour-intensive paper production exercises by a - platform support team. - - * - |icon-ctrl| - - **User-defined controls** - - A configurable policy service enforces tagging of production (i.e. signed-off) assets, - preventing confusion with ad-hoc, experimental or challenger runs. This governs user - permissions and the allocation of platform resources. - - * - |icon-rtl| - - **Seamless route to live** - - In TRAC, everything is always live. The production model is controlled by sign-off policies, - to enforce constrains and record evidence before applying the “production” tag. The sign-off - process has full point-in-time history and every version remains available for rollback. - -.. |icon-zero-risk| image:: ../_images/icon-zero-risk.png - :width: 66px - :height: 66px - -.. |icon-self-gen-ui| image:: ../_images/icon-self-gen-ui.png - :width: 66px - :height: 66px - -.. |icon-auto-doc| image:: ../_images/icon-auto-doc.png - :width: 66px - :height: 66px - -.. |icon-ctrl| image:: ../_images/icon-ctrl.png - :width: 66px - :height: 66px - -.. |icon-rtl| image:: ../_images/icon-rtl.png - :width: 66px - :height: 66px - - -Compatibility and deployment ----------------------------- - -TRAC uses open standards to provide a familiar developer experience and several easy options for integration. - -.. image:: ../_images/overview_compatibility.png - :align: center - -.. grid:: 1 2 2 2 - :gutter: 3 - :class-container: sd-pt-2 - - .. grid-item-card:: - :class-header: sd-bg-light sd-pt-0 sd-pb-1 - :class-body: sd-py-0 - :shadow: md - - **Open standards** - ^^^^^^^^^^^^^^^^^^ - - * **Built on open standards** for maximum compatibility and minimum lock-in - * **Focus on developer experience** with a “batteries included” philosophy, - so developers can get going right away - * **Designed for integration** with complex data landscapes; open standards are best, - integration with bespoke components is also possible - - .. grid-item-card:: - :class-header: sd-bg-light sd-pt-0 sd-pb-1 - :class-body: sd-py-0 - :shadow: md - - **Deployment options** - ^^^^^^^^^^^^^^^^^^^^^^ - - * **Using established patterns** that your organization already has in place will simplify - deployment and maintenance - * **Cloud and tooling vendors** are willing to produce deployment templates for TRAC on their - platforms (some already have)! - * **Common tools** such as Terraform or Ansible can be set up quickly if there is nothing else in place + :widths: 40 200 + + * - **EXPERIMENTAL FLOWS** + - Separate flows can be created for any standardised analytic process, from sensitivity analysis + to periodic model monitoring. Under the virtual deployment framework, Jobs which use + these experimental flows are safely executed on production data and infrastructure. + + * - **EXPERIMENTAL INPUTS** + - Using a 'production' flow, alternate model versions, data inputs + or parameter values can be selected. For quick and simple what-if analysis, old + jobs can be loaded, edited and resubmitted, for example to run last year's models with + this year's data, or vice versa + +TRAC can execute as many parallel jobs as the underlying compute infrastructure will allow and because they +are isolated and stateless, multiple runs can use different versions of the same model or dataset +concurrently. This greatly reduces the time required to complete more complex comparative analytics. \ No newline at end of file diff --git a/doc/overview/key_concepts.rst b/doc/overview/key_concepts.rst new file mode 100644 index 000000000..314f5b37a --- /dev/null +++ b/doc/overview/key_concepts.rst @@ -0,0 +1,188 @@ +Key concepts + + +Metadata Model +==================== + +TRAC is built around a structural metadata model which catalogues and describes everything on the platform. The model consists of three layers: + +.. list-table:: + :widths: 40 200 + + * - **OBJECTS** + - Objects are the model’s structural elements and each object type has its own metadata structure. The most + + * - **TAGS** + - Tags are used to index, describe and control objects. Some tags are controlled by the platform, some you can set yourself. + + * - **TRACEABLE ACTIONS** + - Traceable actions are actions that create objects, such as running jobs or data imports. Read-only + actions such as querying data or metadata searches are not recorded in the metadata model. + + +Metadata records are maintained using an immutable, time-indexed version history, with "updates" being performed +by creating a new version of the object or tag with the required changes. Because of this, the TRAC metadata +provides a fully consistent historical view of the platform for any previous point in time. It also provides +a complete audit history that is both machine and human readable, with no manual effort. + +Where objects refer to external resources such as models and data, those resources are also immutable. +This is achieved using e.g. GitHub tags or Nexus binary versions for models, and data areas owned by TRAC with +controlled write access for primary data. The combination of immutable metadata and immutable resources allows +TRAC to recreate any previous calculation that has run on the platform. + + +Objects +_______ + +All model orchestration use-cases involve four primary object types. The TRAC metadata model includes other object types, but these are the most common. + +.. list-table:: + :widths: 40 40 100 100 + + * - |icon-data| + - **DATA** + - Collections of documents and records which have been imported into a TRAC-controlled Data Store + - Structural representation of the data schema, plus its physical storage location + + * - |icon-model| + - **MODEL** + - Discrete units of code stored in a repository and exposed to TRAC via the model upload process + - The model schema (inputs, outputs and parameters) plus reference to immutable model code or a binary package (e.g. in Git or Nexus) + + * - |icon-flow| + - **FLOW** + - None, the flow is abstract and does not refer to specific data or models + - A calculation involving multiple models represented as a graph where inputs, outputs and models are nodes and edges represent data flow + + * - |icon-job| + - **JOB** + - A process TRAC orchestrates. The five job types are; ImportModel, ImportData, RunModel, RunFlow and ExportData + - The metadata record varies by job type but will record objects which were uses in the process. + + +.. |icon-data| image:: ../../_images/icons/icon_data.png + :width: 66px + :height: 66px + +.. |icon-model| image:: ../../_images/icons/icon_model.png + :width: 66px + :height: 66px + +.. |icon-flow| image:: ../../images/icons/icon_flow.png + :width: 66px + :height: 66px + +.. |icon-job| image:: ../../images/icons/icon_job.png + :width: 66px + :height: 66px + +.. note:: + Because Model and Data objects refer to and describe a persistent external asset which TRAC controls (model code & data records) these objects can also be called "Assets". + + + +Virtual Deployment +---------- +TRAC uses a ‘virtual model deployment' approach in which all model code resides in an external repository +until it is needed for a calculation, so th virtual deployment is therefore crystalised at runtime. There are three main steps involved in the virtual deployment approach. + +.. list-table:: + :widths: 30 200 + + * - **IMPORT MODELS** + - Uploading a model creates a model object in the TRAC metadata store which includes a schema representation of the model. The model code remains in the repository. + + * - **BUILD FLOW** + - Flows can be built and validated using the schema representation of the models. Because the flows themselves exist only as metadata object, we can describe a flow as being a ‘virtual’ deployment of the model into a complex execution process. + + * - **RUN JOBS** + - To execute a RunFlow job you pick a flow and select the data and model objects to use for each node in the flow, plus any required model parameters. TRAC then fetches the model code from the repository and the data records from storage and executes the job. + + + +Models can be deployed and used with no coding or platform-level interventions if they contain the required +schema function. + + +.. list-table:: + :widths: 30 200 + + * - **INPUTS** + - The schema of the data inputs the model needs to run + + * - **PARAMETERS** + - The schema of any parameters which influence how the model runs, which should be provided at runtime. + + * - **OUTPUTS** + - The schema of the output data which the model generates. + +.. note:: + See :ref:`modelling` for more details on the TRAC Model API how to build TRAC-ready models. + + +The existence of a properly declared model schema is confirmed when importing a model onto TRAC using +an ImportModel job. When constructing a flow, the platform validates that the proposed graph is consistent +with the schemas of the model objects. Finally, when executing a RunModel or RunFlow job, TRAC validates +that the model code generates outputs which are consistent with the declared schema. + + +TRAC Guarantee +____________ + +A central feature of the platform is the control environment it creates, which is built on immutabilty and repeatabiltiy. This is embodied by three things: + +.. list-table:: + :widths: 45 60 200 + + * - |icon-audit| + - **AUDITABLE** + - Every action that changes a tag or an object is recorded in a fully time-consistent fashion + in the metadata model, so a complete version history is maintained by default. + + * - |icon-repeat| + - **REPEATABLE** + - Any RunModel or RunFlow job can be re-resubmitted and because the inputs are + immutable, TRAC can repeat calculation and deliver the same result, guaranteed. + + * - |icon-persist| + - **RISK FREE** + - Every version of every object (model, data, flow) remains permanently available to use and there is + no possibility of accidental loss or damage to deployed assets, so there is no change risk. + +.. |icon-audit| image:: ../../images/icons/icon_audit.png + :width: 66px + :height: 66px + +.. |icon-repeat| image:: ../../images/icons/icon_repeat.png + :width: 66px + :height: 66px + +.. |icon-persist| image:: ../../images/icons/icon_persist.png + :width: 66px + :height: 66px + +.. note:: + The repeatability guarantee does not apply to an ImportData job because changes in the external data source may mean that different data is brought across, and a model cannot be imported twice so an ImportModel job cannot be repeated. + + +Some other useful features +____________ + + - **Automated governance documentation** - The metadata is designed to br easily understood by + both humans and machines and is fully controlled and searchable. Standard report formats can be + used to create governance-ready documentation for model implementation oversight, data lineage + reporting and internal audit. + + - **Tweak and repeat** - Old jobs can be loaded up into the same tools used to create them originally, + because the metadata format is the same. They can then be edited and resubmitted with any desired + changes. Run last year's models with this year's data, or a series of what-if scenarios. + If the new data and models are not compatible, TRAC will explain exactly what the differences are. + + - **Parallel runs, parallel versions** - TRAC can execute as many parallel runs as the underlying compute + infrastructure will allow. Because the runs are isolated and stateless, multiple runs can use different + versions of the same model or the same dataset at the same time. + + - **Combine model versions** - It is even possible to load different versions of the same model code within + a single run. This can be useful to run challenger versions of individual components in a long model + chain, or if some model components are versioned independently. TRAC handles the complexity of loading + multiple versions of the same codebase into the executor process. diff --git a/doc/unused/tags.rst b/doc/unused/tags.rst new file mode 100644 index 000000000..4a54dccf2 --- /dev/null +++ b/doc/unused/tags.rst @@ -0,0 +1,191 @@ + +Tags +---- + + +:class:`Tags` are the core informational element of TRAC’s metadata model, they are +used to index, describe and control objects. Every object has a tag and each tag refers to a single object, +i.e. there is a one-to-one association. + +A tag is made up of: + + * A header that identifies the tag and associated object + * A set of attributes (key-value pairs) + * The associated object definition + +The object definition may sometimes be omitted, for example search results for metadata queries +do not include the full object definition. + +Here is an example of a set of tag attributes to illustrate some ways they can be used:: + + # A descriptive field intended for human users. + + display_name: "Customer accounts for March 2020, corrected April 6th" + + # A classification that can be used for searching or indexing. + # Client applications can also use this to find datasets of a certain + # type; typically an application will define a set of attributes that are + # "structural", i.e. the application uses those attributes to decide which + # objects to present for certain purposes. + + dataset_class: "customer_accounts" + + # Properties of an item can be added as individual attributes so they can + # be searched and displayed individually. This avoids the anti-pattern of + # putting multiple attributes into a single name/label field: + # customer_accounts_mar20_scotland_commercial_approved + + accounting_date: (DATE) 2020-03-31 + region: "Scotland" + book: "commercial_property" + figures_approved: (BOOLEAN) true + + # Attributes can be multi-valued. This can be helpful for applying + # regulatory classifiers, where multiple classifiers may apply to a + # single item. + + data_classification: ["confidential", "gdpr_pii", "audited"] + + # TRAC records a number of "controlled" attributes, these are set by the + # platform and cannot be modified directly through the metadata API. + # Controlled attributes start with the prefix "trac_". + + trac_create_time: (DATETIME) 2020-04-01 10:37:05 + trac_create_user_id: "jane.doe" + trac_create_user_name: "Jane Doe" + +Tag attributes are created and updated using :class:`TagUpdate` operations. +Tag updates are instructions to add, replace, append (for multi-valued attributes) or delete an attribute. +These instructions can be supplied when an object is created or updated, in which case TRAC will fill +in some attributes automatically (timestamp, sign-off state etc). It is also possible to update tags +without changing the associated object, for example to reclassify a dataset or change a description. + + +Versioning +---------- + +Versioning is supported for both objects and tags. For objects, versions are a series of immutable +copies where TRAC guarantees compatibility and continuity between versions. The general principal +for compatibility is that new versions will work in place of old versions (i.e. object versions are +backwards-compatible, but the reverse is not necessarily true) and for continuity is that the object +should describe the same resource. The exact requirements for these rules vary depending on object type. + +Of particular interest are data updates. In this case, updates can include (1) adding a delta to a +dataset, (2) providing a new snapshot of a dataset (3) adding a partition or (4) updating a partition +with a new snapshot or delta. A new version of the metadata object is created that refers to the new set +of primary data files, including any that are unchanged from the previous version. For example if a delta +is added, the new data definition would refer to all the files referenced in the previous version, plus +the new delta. + +A series of tag versions is assigned to every object version. Let's illustrate this with an example:: + + v = 1, t = 1 # Initial creation of an object + # Let's say it's a dataset containing customer data for some date T0 + + v = 1, t = 2 # Add a tag attribute, extra_attr = "some_value" + + v = 2, t = 1 # Corrections are applied to the data, so a new object version is created + # By default the attributes from v=1, t=2 are copied to the new tag + + v = 3, t = 1 # Data is added for a second day T1, in a separate partition + + v = 2, t = 2 # The data for T0 is signed off and the policy service updates the sign-off tag + # The tag applies to object version 2, which includes data for T0 with the corrections + +Object and tag versions are given numbers as shown here, they are also given timestamps which are +recorded by the system when a new object or tag version is created. Either a version number or a +timestamp can be used to uniquely identify versions for both objects and tags. + + +Selectors +--------- + +A :class:`TagSelector ` refers to a single object ID and identifies a specific +object version and tag version for that object. They are used throughout the TRAC platform whenever an +object is referenced, so it is always possible to specify versions using these selection criteria. The +available criteria are: + + 1. | Select the latest available version + | - *Variable selector, will return a different result when an object or tag is updated to a new version* + + 2. | Select a fixed version number + | - *Fixed selector, will always return the same result* + + 3. | Select the version for a previous point in time + | - *Fixed selector, will always return the same result* + +Selectors are used in API calls, for example reading a single object from the metadata API uses a tag selector. +Sending API calls with selectors referring to a previous point in time allows client applications to display a +consistent historical view of the platform. + +Selectors are also stored in the metadata model to express links between objects. For example, a job definition +uses tag selectors to identify the inputs and models that will be used to execute the job. In the case of a +job definition, the selectors are always stored as fixed selectors to indicate the precise object versions +used; if the user submits a job requesting the latest version of a model or input, TRAC will convert that +selector to a fixed selector before storing the job definition. + +Selectors refer to object and tag versions independently and there is no requirement to use the same selection +criteria for both. A selector for objectVersion = 3 with latestTag = true is perfectly valid, this could be +used for example to check the current sign-off state of a particular version of a model. + + +Queries +------- + +The TRAC metadata can be searched using logical expressions to match against tag attributes. Version +and/or timestamp information can also be included as search parameters. It is not possible to search the +contents of an object definition; any properties of an object that are needed for searching must be set +as tag attributes to make them available for metadata queries. + +A search expression is a logical combination of search terms that can be built up as an expression tree. +The logical operators available are AND, OR and NOT. A search term matches an individual attribute using +one of the available search operators. + + +.. list-table:: + :header-rows: 1 + :widths: 75 500 + + * + - Operator + - Meaning + + * - **EQ** == + - | Matches an attribute exactly. The attribute must be present and have the correct type and value. + If the attribute is multi-valued, EQ will match if any of the values match. + | *EQ may behave erratically for floating point attributes, using EQ, NE or IN with float values + is not recommended.* + + * - **NE** != + - The logical inverse of EQ, matches precisely when EQ does not match. If the search attribute is + not present, NE will match. If the search attribute is multi-value, NE will match only when none + of the values match. + + * - **IN** + - attr IN [a, b, c] is equivalent to attr == a OR attr = b OR attr = c. If the attribute is multi- + valued, IN will match if any of the search values match any of the attribute values. + + * - + | **GT** > + | **GE** >= + | **LT** < + | **LE** <= + + - Ordered comparisons, for ordered data types only. The attribute must be present and the type must + match the search type (comparing an integer to a float, or a date to a date-time value will not match). + Ordered comparisons will never match if the search attribute is multi-valued. + + +By default, only the latest versions of objects and tags are considered in a search. Even if a prior version +of an object or tag version would have matched, that prior version is not considered. There are options in the +search parameters to include prior versions, in which case all matching versions of an object or tag will be +returned. + +All searches can optionally be run as-of a previous point in time, which will cause the search to ignore +metadata generated after that time. These searches still have the option to include prior versions if +required. Using this feature allows clients to show a consistent historical view of the platform for +functionality that relies on metadata queries. + +For the full API reference on metadata searches, see the reference pages for +:class:`SearchParameters` and +:meth:`TracMetadataApi.search()`.