From 375079d0535dcab569a29587bab64cb8cd40abc3 Mon Sep 17 00:00:00 2001 From: Jillian Vogel Date: Thu, 15 Feb 2024 17:23:43 +1030 Subject: [PATCH 1/2] docs: enhancing the "How to extend dbt" page --- docs/how-tos/dbt_extensions.rst | 187 ++++++++++++++++++++++++++++---- 1 file changed, 168 insertions(+), 19 deletions(-) diff --git a/docs/how-tos/dbt_extensions.rst b/docs/how-tos/dbt_extensions.rst index eb250b0..22a6649 100644 --- a/docs/how-tos/dbt_extensions.rst +++ b/docs/how-tos/dbt_extensions.rst @@ -6,9 +6,49 @@ Extending dbt As noted in :ref:`dbt`, you can install your own custom dbt package to apply your own transforms to the event data in Aspects. -**Step 1. Create your dbt package** +This guide demonstrates how to create and use a custom dbt package in Aspects by building the `aspects-dbt-example`_ +repo. See `aspects-dbt-example#1`_ to follow along with each of these steps. -Create a new dbt package using `dbt init`_. +See `Building dbt packages`_ and `Best practice guides`_ for dbt's official documentation. + +Step 0. Install dbt core +======================== + +The easiest way to install dbt core is to use pip in a python3 virtual environment. + +See `aspects/requirements.txt`_ for the specific package versions used by Aspects. + +.. code-block:: bash + + # Create and activate a python3 virtual environment + python3 -m venv venv + source venv/bin/activate + pip install --upgrade pip + + # Install the dbt package versions used by Aspects + pip install dbt-clickhouse==1.7.1 dbt-core==1.7.0 + +See `Install dbt`_ for more ways to install dbt. + +Step 1. Initialize your dbt package +=================================== + +Create a new dbt package by following the prompts given by the ``dbt init`` tool. + +.. code-block:: bash + + dbt init + # Enter a name for your project (letters, digits, underscore): aspects_dbt_example + # Which database would you like to use? [1] clickhouse + + ls aspects_dbt_example + # analyses/ dbt_project.yml macros/ models/ README.md seeds/ snapshots/ tests/ + + +See `About dbt init`_ for more options. + +Step 2. Use the Aspects profile +=============================== Update the generated ``dbt_project.yml`` to use the ``aspects`` profile: @@ -17,14 +57,13 @@ Update the generated ``dbt_project.yml`` to use the ``aspects`` profile: # This setting configures which "profile" dbt uses for this project. profile: 'aspects' -See `Building dbt packages`_ for more details, and `Writing data tests`_ for how to validate your transformations. - -**Step 2. Link to aspects-dbt** +Step 3. Link to aspects-dbt +=========================== Aspects charts depend on the transforms in `aspects-dbt`_, so it's important that your dbt package also installs -the same version of `aspects-dbt`_ as your Aspects Tutor plugin. +the same version of `aspects-dbt`_ as your version of the Aspects Tutor plugin. -To do this, add a ``packages.yml`` file to your dbt package at the top level, where: +To do this, add a ``packages.yml`` file to your dbt package at the top level where: * ``git`` url matches the default value of ``DBT_REPOSITORY`` in `tutor-contrib-aspects plugin.py`_ * ``revision`` matches the default value of ``DBT_BRANCH`` in `tutor-contrib-aspects plugin.py`_ @@ -33,11 +72,87 @@ To do this, add a ``packages.yml`` file to your dbt package at the top level, wh packages: - git: "https://github.com/openedx/aspects-dbt.git" - revision: v2.2 + revision: v3.4.1 + +Step 4. Add your custom transforms +================================== + +Here is where you will need an understanding of dbt, Clickhouse, Aspects' data schemas, and the specific transforms you +want to create. + +.. note:: You can use Aspects to debug your custom SQL: + + #. Login to Superset as an Open edX superuser. + #. Using the menus at the top of the page, navigate to the "SQL -> SQL Lab" UI. + #. Browse the schemas and run read-only SQL queries on your data. + +For this tutorial, we added a new model which will be materialized by dbt into a view in Clickhouse. +Our new model calculates the average number of attempts made by users on each problem by referencing the +`int_problem_results` model created by the base aspects-dbt package (see `dbt ref`_): + +.. code-block:: + + select + problem_id, AVG(attempts) as average_attempts + from + ( + select + problem_id, + max(attempts) as attempts + from + {{ ref('int_problem_results') }} + group by + actor_id, + problem_id + ) + group by problem_id + +Next, make sure your model is configured in the ``db_project.yml``. If you forget this step, dbt will warn you when +running your package. + +.. code-block:: yaml -**Step 3. Install and run your dbt package** + models: + problem_responses: + # Config indicated by + and applies to all files under models/problem_responses/ + +materialized: view -Update the following Tutor variables to use your package instead of the Aspects default. + +See `About dbt models`_ to learn more. + +Step 5. Add tests +================= + +Writing tests for your transforms is important: not only can tests validate and document your intended changes, they +can be used to guard against data edge cases and regressions from future code changes. + +dbt generic tests are defined as SQL files, where the goal of the SQL statement is to return zero records. + +Because our new `average_attempts` model aggregates on `actor_id` and `problem_id`, it should only have 1 entry for each +`problem_id`. So our test can be: + +.. code-block: sql + + -- average_attempts should only have one record for each problem_id. + select + count(*) as num_rows + from + {{ ref('average_attempts') }} + group by + problem_id + having num_rows > 1 + + +See `Writing data tests`_ for more examples. + + +Step 6. Install and use your dbt package +======================================== + +Once you've pushed all the changes to your custom dbt package repo, now we're ready to use it. + +Use ``tutor config save`` to update the following Tutor variables to use your custom package instead of the Aspects +default. - ``DBT_REPOSITORY``: A git repository URL to clone and use as the dbt project. @@ -46,14 +161,9 @@ Update the following Tutor variables to use your package instead of the Aspects Default: ``https://github.com/openedx/aspects-dbt`` - ``DBT_BRANCH``: The branch to use when cloning the dbt project. - Set this to the hash/branch/tag of your custom dbt package that you wish to use. + Set this to the hash/branch/tag of the custom dbt package that you wish to use. Default: varies between versions of Aspects. -- ``DBT_PROJECT_DIR``: The directory to use as the dbt project. - - Set this to the name of your dbt package repository. - - Default: ``aspects-dbt`` - ``EXTRA_DBT_PACKAGES``: Add any python packages that your dbt project requires here. Default: ``[]`` @@ -61,19 +171,58 @@ Update the following Tutor variables to use your package instead of the Aspects - ``DBT_SSH_KEY``: The private SSH key to use when cloning the dbt project. Only necessary if you are using a private repository. -Once your package is configured in Tutor, you can run dbt commands directly on your deployment; run ``tutor [dev|local] do dbt --help`` for details. +Once your package is configured in Tutor, you can run dbt commands directly on your deployment. + +See `dbt commands`_ for a full list of available commands. + +.. code-block:: bash + + # Build and test your package + tutor dev do dbt -c "build" + + # Deploy your customizations + tutor dev do dbt -c "run" + + # Run tests on the data + tutor dev do dbt -c "test" + + +Step 7. Troubleshooting +======================= + +You may need to repeat steps 4-6 a few times to resolve any warnings or errors that dbt reports with your package. + +Don't forget to push your changes to your repo before running the tutor dbt command: it fetches a clean copy of your +configured package repo + branch each time it runs. + +See `dbt debugging`_ for more information on how to debug issues with your package. + References ########## * `Building dbt packages`_: dbt's guide to building packages +* `Best practice guides`_: dbt's guidelines on project structure, style, and setup. +* `About dbt models`_: dbt's guide to creating SQL or Python model transforms * `Writing data tests`_: dbt's guide to writing package tests +* `dbt commands`_: list of all dbt commands +* `dbt debugging`_: guide for debugging issues with dbt * `aspects-dbt`_: Aspects' dbt package -* `eduNEXT/dbt-aspects-unidigital`_: a custom dbt packages running in production Aspects +* `aspects-dbt-example`_: the demo custom dbt package used in this tutorial. +* `eduNEXT/dbt-aspects-unidigital`_: a real custom dbt package running in production Aspects .. _aspects-dbt: https://github.com/openedx/aspects-dbt -.. _dbt init: https://docs.getdbt.com/reference/commands/init +.. _aspects-dbt-example: https://github.com/open-craft/aspects-dbt-example +.. _aspects-dbt-example#1: https://github.com/open-craft/aspects-dbt-example/pull/1 +.. _aspects/requirements.txt: https://github.com/openedx/tutor-contrib-aspects/blob/main/tutoraspects/templates/aspects/build/aspects/requirements.txt +.. _About dbt init: https://docs.getdbt.com/reference/commands/init +.. _About dbt models: https://docs.getdbt.com/docs/build/models +.. _Best practice guides: https://docs.getdbt.com/best-practices +.. _dbt commands: https://docs.getdbt.com/reference/dbt-commands +.. _dbt debugging: https://docs.getdbt.com/guides/debug-errors +.. _dbt ref: https://docs.getdbt.com/reference/dbt-jinja-functions/ref .. _eduNEXT/dbt-aspects-unidigital: https://github.com/eduNEXT/dbt-aspects-unidigital .. _Building dbt packages: https://docs.getdbt.com/guides/building-packages +.. _Install dbt: https://docs.getdbt.com/docs/core/installation-overview .. _Writing data tests: https://docs.getdbt.com/best-practices/writing-custom-generic-tests .. _tutor-contrib-aspects plugin.py: https://github.com/openedx/tutor-contrib-aspects/blob/main/tutoraspects/plugin.py From 186d5ae51697b54a82d9b35704f6741485f5dfd9 Mon Sep 17 00:00:00 2001 From: Jillian Vogel Date: Sun, 18 Feb 2024 22:06:45 +1030 Subject: [PATCH 2/2] fix: bumps version of dbt-clickhouse to match Aspects' --- docs/how-tos/dbt_extensions.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/how-tos/dbt_extensions.rst b/docs/how-tos/dbt_extensions.rst index 22a6649..470b178 100644 --- a/docs/how-tos/dbt_extensions.rst +++ b/docs/how-tos/dbt_extensions.rst @@ -26,7 +26,7 @@ See `aspects/requirements.txt`_ for the specific package versions used by Aspect pip install --upgrade pip # Install the dbt package versions used by Aspects - pip install dbt-clickhouse==1.7.1 dbt-core==1.7.0 + pip install dbt-clickhouse==1.7.2 dbt-core==1.7.0 See `Install dbt`_ for more ways to install dbt.