Skip to content

Commit

Permalink
Merge pull request #190 from openedx/jill/howto-extend-dbt
Browse files Browse the repository at this point in the history
docs: enhancing the "How to extend dbt" page
  • Loading branch information
Ian2012 authored Feb 20, 2024
2 parents 413a29e + 186d5ae commit 2569287
Showing 1 changed file with 168 additions and 19 deletions.
187 changes: 168 additions & 19 deletions docs/how-tos/dbt_extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,49 @@ Extending dbt
As noted in :ref:`dbt`, you can install your own custom dbt package to apply your own transforms to the event data
in Aspects.

**Step 1. Create your dbt package**
This guide demonstrates how to create and use a custom dbt package in Aspects by building the `aspects-dbt-example`_
repo. See `aspects-dbt-example#1`_ to follow along with each of these steps.

Create a new dbt package using `dbt init`_.
See `Building dbt packages`_ and `Best practice guides`_ for dbt's official documentation.

Step 0. Install dbt core
========================

The easiest way to install dbt core is to use pip in a python3 virtual environment.

See `aspects/requirements.txt`_ for the specific package versions used by Aspects.

.. code-block:: bash
# Create and activate a python3 virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
# Install the dbt package versions used by Aspects
pip install dbt-clickhouse==1.7.2 dbt-core==1.7.0
See `Install dbt`_ for more ways to install dbt.

Step 1. Initialize your dbt package
===================================

Create a new dbt package by following the prompts given by the ``dbt init`` tool.

.. code-block:: bash
dbt init
# Enter a name for your project (letters, digits, underscore): aspects_dbt_example
# Which database would you like to use? [1] clickhouse
ls aspects_dbt_example
# analyses/ dbt_project.yml macros/ models/ README.md seeds/ snapshots/ tests/
See `About dbt init`_ for more options.

Step 2. Use the Aspects profile
===============================

Update the generated ``dbt_project.yml`` to use the ``aspects`` profile:

Expand All @@ -17,14 +57,13 @@ Update the generated ``dbt_project.yml`` to use the ``aspects`` profile:
# This setting configures which "profile" dbt uses for this project.
profile: 'aspects'
See `Building dbt packages`_ for more details, and `Writing data tests`_ for how to validate your transformations.

**Step 2. Link to aspects-dbt**
Step 3. Link to aspects-dbt
===========================

Aspects charts depend on the transforms in `aspects-dbt`_, so it's important that your dbt package also installs
the same version of `aspects-dbt`_ as your Aspects Tutor plugin.
the same version of `aspects-dbt`_ as your version of the Aspects Tutor plugin.

To do this, add a ``packages.yml`` file to your dbt package at the top level, where:
To do this, add a ``packages.yml`` file to your dbt package at the top level where:

* ``git`` url matches the default value of ``DBT_REPOSITORY`` in `tutor-contrib-aspects plugin.py`_
* ``revision`` matches the default value of ``DBT_BRANCH`` in `tutor-contrib-aspects plugin.py`_
Expand All @@ -33,11 +72,87 @@ To do this, add a ``packages.yml`` file to your dbt package at the top level, wh
packages:
- git: "https://github.com/openedx/aspects-dbt.git"
revision: v2.2
revision: v3.4.1
Step 4. Add your custom transforms
==================================

Here is where you will need an understanding of dbt, Clickhouse, Aspects' data schemas, and the specific transforms you
want to create.

.. note:: You can use Aspects to debug your custom SQL:

#. Login to Superset as an Open edX superuser.
#. Using the menus at the top of the page, navigate to the "SQL -> SQL Lab" UI.
#. Browse the schemas and run read-only SQL queries on your data.

For this tutorial, we added a new model which will be materialized by dbt into a view in Clickhouse.
Our new model calculates the average number of attempts made by users on each problem by referencing the
`int_problem_results` model created by the base aspects-dbt package (see `dbt ref`_):

.. code-block::
select
problem_id, AVG(attempts) as average_attempts
from
(
select
problem_id,
max(attempts) as attempts
from
{{ ref('int_problem_results') }}
group by
actor_id,
problem_id
)
group by problem_id
Next, make sure your model is configured in the ``db_project.yml``. If you forget this step, dbt will warn you when
running your package.

.. code-block:: yaml
**Step 3. Install and run your dbt package**
models:
problem_responses:
# Config indicated by + and applies to all files under models/problem_responses/
+materialized: view
Update the following Tutor variables to use your package instead of the Aspects default.
See `About dbt models`_ to learn more.

Step 5. Add tests
=================

Writing tests for your transforms is important: not only can tests validate and document your intended changes, they
can be used to guard against data edge cases and regressions from future code changes.

dbt generic tests are defined as SQL files, where the goal of the SQL statement is to return zero records.

Because our new `average_attempts` model aggregates on `actor_id` and `problem_id`, it should only have 1 entry for each
`problem_id`. So our test can be:

.. code-block: sql
-- average_attempts should only have one record for each problem_id.
select
count(*) as num_rows
from
{{ ref('average_attempts') }}
group by
problem_id
having num_rows > 1
See `Writing data tests`_ for more examples.


Step 6. Install and use your dbt package
========================================

Once you've pushed all the changes to your custom dbt package repo, now we're ready to use it.

Use ``tutor config save`` to update the following Tutor variables to use your custom package instead of the Aspects
default.

- ``DBT_REPOSITORY``: A git repository URL to clone and use as the dbt project.

Expand All @@ -46,34 +161,68 @@ Update the following Tutor variables to use your package instead of the Aspects
Default: ``https://github.com/openedx/aspects-dbt``
- ``DBT_BRANCH``: The branch to use when cloning the dbt project.

Set this to the hash/branch/tag of your custom dbt package that you wish to use.
Set this to the hash/branch/tag of the custom dbt package that you wish to use.

Default: varies between versions of Aspects.
- ``DBT_PROJECT_DIR``: The directory to use as the dbt project.

Set this to the name of your dbt package repository.

Default: ``aspects-dbt``
- ``EXTRA_DBT_PACKAGES``: Add any python packages that your dbt project requires here.

Default: ``[]``
- ``DBT_PROFILE_*``: variables used in the Aspects ``dbt/profiles.yml`` file, including several Clickhouse connection settings.

- ``DBT_SSH_KEY``: The private SSH key to use when cloning the dbt project. Only necessary if you are using a private repository.

Once your package is configured in Tutor, you can run dbt commands directly on your deployment; run ``tutor [dev|local] do dbt --help`` for details.
Once your package is configured in Tutor, you can run dbt commands directly on your deployment.

See `dbt commands`_ for a full list of available commands.

.. code-block:: bash
# Build and test your package
tutor dev do dbt -c "build"
# Deploy your customizations
tutor dev do dbt -c "run"
# Run tests on the data
tutor dev do dbt -c "test"
Step 7. Troubleshooting
=======================

You may need to repeat steps 4-6 a few times to resolve any warnings or errors that dbt reports with your package.

Don't forget to push your changes to your repo before running the tutor dbt command: it fetches a clean copy of your
configured package repo + branch each time it runs.

See `dbt debugging`_ for more information on how to debug issues with your package.


References
##########

* `Building dbt packages`_: dbt's guide to building packages
* `Best practice guides`_: dbt's guidelines on project structure, style, and setup.
* `About dbt models`_: dbt's guide to creating SQL or Python model transforms
* `Writing data tests`_: dbt's guide to writing package tests
* `dbt commands`_: list of all dbt commands
* `dbt debugging`_: guide for debugging issues with dbt
* `aspects-dbt`_: Aspects' dbt package
* `eduNEXT/dbt-aspects-unidigital`_: a custom dbt packages running in production Aspects
* `aspects-dbt-example`_: the demo custom dbt package used in this tutorial.
* `eduNEXT/dbt-aspects-unidigital`_: a real custom dbt package running in production Aspects

.. _aspects-dbt: https://github.com/openedx/aspects-dbt
.. _dbt init: https://docs.getdbt.com/reference/commands/init
.. _aspects-dbt-example: https://github.com/open-craft/aspects-dbt-example
.. _aspects-dbt-example#1: https://github.com/open-craft/aspects-dbt-example/pull/1
.. _aspects/requirements.txt: https://github.com/openedx/tutor-contrib-aspects/blob/main/tutoraspects/templates/aspects/build/aspects/requirements.txt
.. _About dbt init: https://docs.getdbt.com/reference/commands/init
.. _About dbt models: https://docs.getdbt.com/docs/build/models
.. _Best practice guides: https://docs.getdbt.com/best-practices
.. _dbt commands: https://docs.getdbt.com/reference/dbt-commands
.. _dbt debugging: https://docs.getdbt.com/guides/debug-errors
.. _dbt ref: https://docs.getdbt.com/reference/dbt-jinja-functions/ref
.. _eduNEXT/dbt-aspects-unidigital: https://github.com/eduNEXT/dbt-aspects-unidigital
.. _Building dbt packages: https://docs.getdbt.com/guides/building-packages
.. _Install dbt: https://docs.getdbt.com/docs/core/installation-overview
.. _Writing data tests: https://docs.getdbt.com/best-practices/writing-custom-generic-tests
.. _tutor-contrib-aspects plugin.py: https://github.com/openedx/tutor-contrib-aspects/blob/main/tutoraspects/plugin.py

0 comments on commit 2569287

Please sign in to comment.