From 157c345fee31a10dc805c8e44d2b2ec9ccfd06e0 Mon Sep 17 00:00:00 2001 From: Jillian Date: Thu, 25 Jan 2024 13:26:06 +1030 Subject: [PATCH] docs: adds DBT concept documentation (#111) * docs: adds DBT concept documentation and updates the DBT Extension how-to. --- docs/concepts/dbt.rst | 37 ++++++++++++++ docs/concepts/index.rst | 1 + docs/how-tos/dbt_extensions.rst | 87 ++++++++++++++++++++++++++++----- 3 files changed, 113 insertions(+), 12 deletions(-) create mode 100644 docs/concepts/dbt.rst diff --git a/docs/concepts/dbt.rst b/docs/concepts/dbt.rst new file mode 100644 index 0000000..a3be638 --- /dev/null +++ b/docs/concepts/dbt.rst @@ -0,0 +1,37 @@ +.. _dbt: + +data build tool (dbt) +********************* + +dbt is an open source, command-line tool managed by `dbtlabs`_ for generating and maintaining data transformations. + +dbt allows engineers to transform data by writing ``SELECT`` statements that reflect business logic which dbt +materializes into tables and views that can be queried efficiently. + +dbt also allows engineers to modularize and re-use their transformation code using "packages" that can be shared across +projects or organizations. + +dbt in Aspects +############## + +Aspects uses the `aspects-dbt`_ package to define the transforms used by the Aspects project. This package creates and +manages macros and materialized views for data tables stored in :ref:`Clickhouse`, and provides some tests. + +Operators may create and install their own dbt packages; see :ref:`dbt-extensions` for details. + +`tutor-contrib-aspects`_ also provides a "do" command to proxy running `dbt commands`_ against your deployment; run +``tutor [dev|local] do dbt --help`` for details. + +References +########## + +* `dbtlabs`_: dbt documentation +* `dbt-core`_: core dbt package +* `aspects-dbt`_: Aspects dbt transforms +* `tutor-contrib-aspects`_: Aspects Tutor plugin + +.. _aspects-dbt: https://github.com/openedx/aspects-dbt/#aspects-dbt +.. _dbtlabs: https://docs.getdbt.com/ +.. _dbt-core: https://github.com/dbt-labs/dbt-core +.. _dbt commands: https://docs.getdbt.com/reference/dbt-commands +.. _tutor-contrib-aspects: https://github.com/openedx/tutor-contrib-aspects diff --git a/docs/concepts/index.rst b/docs/concepts/index.rst index 5bb81c5..bd70e12 100644 --- a/docs/concepts/index.rst +++ b/docs/concepts/index.rst @@ -9,6 +9,7 @@ Concepts xAPI Tracking Logs Clickhouse + dbt Ralph Vector Pipelines diff --git a/docs/how-tos/dbt_extensions.rst b/docs/how-tos/dbt_extensions.rst index 69ea946..f24e400 100644 --- a/docs/how-tos/dbt_extensions.rst +++ b/docs/how-tos/dbt_extensions.rst @@ -1,14 +1,77 @@ .. _dbt-extensions: -DBT extensions -************** - -To extend the DBT project, you can use the following Tutor variables: - -- **DBT_REPOSITORY**: A git repository URL to clone and use as the DBT project. -- **DBT_BRANCH**: The branch to use when cloning the DBT project. -- **DBT_PROJECT_DIR**: The directory to use as the DBT project. -- **EXTRA_DBT_PACKAGES**: A list of python packages for the DBT project to install. -- **DBT_ENABLE_OVERRIDE**: This variable determines whether the DBT project override feature - should be enabled or not. When enabled, it allows you to make changes to the **dbt_project.yml** - and **packages.yml** files using the tutor patches: `dbt-packages` and `dbt-project`. +Extending dbt +************* + +As noted in :ref:`dbt`, you can install your own custom dbt package to apply your own transforms to the event data +in Aspects. + +**Step 1. Create your dbt package** + +Create a new dbt package using `dbt init`_. + +Update the generated ``dbt_project.yml`` to use the ``aspects`` profile: + +.. code-block:: yaml + + # This setting configures which "profile" dbt uses for this project. + profile: 'aspects' + +See `Building dbt packages`_ for more details, and `Writing data tests`_ for how to validate your transformations. + +**Step 2. Link to aspects-dbt** + +Aspects charts depend on the transforms in `aspects-dbt`_, so it's important that your dbt package also installs +the same version of `aspects-dbt`_ as your Aspects Tutor plugin. + +To do this, add a ``packages.yml`` file to your dbt package at the top level, where: + +* ``git`` url matches the default value of ``DBT_REPOSITORY`` in `tutor-contrib-aspects plugin.py`_ +* ``revision`` matches the default value of ``DBT_BRANCH`` in `tutor-contrib-aspects plugin.py`_ + +.. code-block:: yaml + + packages: + - git: "https://github.com/openedx/aspects-dbt.git" + revision: v2.2 + +**Step 3. Install and run your dbt package** + +Update the following Tutor variables to use your package instead of the Aspects default. + +- ``DBT_REPOSITORY``: A git repository URL to clone and use as the dbt project. + + Set this to the URL for your custom dbt package. + + Default: ``https://github.com/openedx/aspects-dbt`` +- ``DBT_BRANCH``: The branch to use when cloning the dbt project. + + Set this to the hash/branch/tag of your custom dbt package that you wish to use. + + Default: varies between versions of Aspects. +- ``DBT_PROJECT_DIR``: The directory to use as the dbt project. + + Set this to the name of your dbt package repository. + + Default: ``aspects-dbt`` +- ``EXTRA_DBT_PACKAGES``: Add any python packages that your dbt project requires here. + + Default: ``[]`` +- ``DBT_PROFILE_*``: variables used in the Aspects ``dbt/profiles.yml`` file, including several Clickhouse connection settings. + +Once your package is configured in Tutor, you can run dbt commands directly on your deployment; run ``tutor [dev|local] do dbt --help`` for details. + +References +********** + +* `Building dbt packages`_: dbt's guide to building packages +* `Writing data tests`_: dbt's guide to writing package tests +* `aspects-dbt`_: Aspects' dbt package +* `eduNEXT/dbt-aspects-unidigital`_: a custom dbt packages running in production Aspects + +.. _aspects-dbt: https://github.com/openedx/aspects-dbt +.. _dbt init: https://docs.getdbt.com/reference/commands/init +.. _eduNEXT/dbt-aspects-unidigital: https://github.com/eduNEXT/dbt-aspects-unidigital +.. _Building dbt packages: https://docs.getdbt.com/guides/building-packages +.. _Writing data tests: https://docs.getdbt.com/best-practices/writing-custom-generic-tests +.. _tutor-contrib-aspects plugin.py: https://github.com/openedx/tutor-contrib-aspects/blob/main/tutoraspects/plugin.py