Skip to content

Commit

Permalink
dbt-getting-started: materialized tests (#31)
Browse files Browse the repository at this point in the history
Add continuous testing example to demo.
  • Loading branch information
ahelium authored Jun 18, 2022
1 parent 019cbea commit 0fda761
Show file tree
Hide file tree
Showing 12 changed files with 113 additions and 32 deletions.
2 changes: 1 addition & 1 deletion .github/tests/dbt-get-started.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ docker-compose exec -T dbt dbt run
sleep 5

# Check that there's data making it's way to the avg_bid materialized view
record_count=$(docker-compose run -T mzcli -Atc 'SELECT COUNT(*) FROM avg_bid')
record_count=$(docker-compose run -T cli -Atc 'SELECT COUNT(*) FROM avg_bid')
[[ "$record_count" -gt 0 ]]
8 changes: 8 additions & 0 deletions dbt-get-started/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM python:3.9.9-bullseye

WORKDIR /usr/app/dbt

RUN set -ex; \
pip install --no-cache-dir dbt-materialize==1.1.2

ENTRYPOINT ["/bin/bash"]
76 changes: 69 additions & 7 deletions dbt-get-started/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[dbt](https://docs.getdbt.com/docs/introduction) has become the standard for data transformation (“the T in ELT”). It combines the accessibility of SQL with software engineering best practices, allowing you to not only build reliable data pipelines, but also document, test and version-control them.

While dbt is a great fit for **batch** transformations, it can only **approximate** transforming streaming data. This demo recreates the Materialize [getting started guide](https://materialize.com/docs/get-started/) using dbt as the transformation layer.
This demo recreates the Materialize [getting started guide](https://materialize.com/docs/get-started/) using dbt as the transformation layer.

## Docker

Expand Down Expand Up @@ -38,15 +38,15 @@ dbt --version

We've created a few core models that take care of defining the building blocks of a dbt+Materialize project, including a streaming [source](https://materialize.com/docs/overview/api-components/#sources):

- `market_orders_raw.sql`
- `sources/market_orders_raw.sql`

, as well as a staging [view](https://materialize.com/docs/overview/api-components/#non-materialized-views) to transform the source data:

- `market_orders.sql`
- `staging/stg_market__orders.sql`

and a [materialized view](https://materialize.com/docs/overview/api-components/#materialized-views) that continuously updates as the underlying data changes:
, and a [materialized view](https://materialize.com/docs/overview/api-components/#materialized-views) that continuously updates as the underlying data changes:

- `avg_bid.sql`
- `marts/avg_bid.sql`

To run the models:

Expand All @@ -56,12 +56,50 @@ dbt run

> :crab: As an exercise, you can add models for the queries demonstrating [joins](https://materialize.com/docs/get-started/#joins) and [temporal filters](https://materialize.com/docs/get-started/#temporal-filters).
### Test the project

To help demonstrate how `dbt test` works with Materialize for **continuous testing**, we've added some [generic tests](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) to the [`avg_bid` model](dbt/models/marts/avg_bid.sql):

```yaml
models:
- name: avg_bid
description: 'Computes the average bid price'
columns:
- name: symbol
description: 'The stock ticker'
tests:
- not_null
- unique
```
, and configured testing in the [project file](dbt/dbt_project.yml):
```yaml
tests:
mz_get_started:
marts:
+store_failures: true
+schema: 'etl_failure'
```
Note that tests are configured to [`store_failures`](https://docs.getdbt.com/reference/resource-configs/store_failures), which instructs dbt to create a materialized view for each test using the respective `SELECT` statements.

To run the tests:

```bash
dbt test
```

This creates two materialized views in a dedicated schema (`public_etl_failures`): `not_null_avg_bid_symbol` and `unique_avg_bid_symbol`. dbt takes care of naming the views based on the type of test (`not_null`, `unique`) and the columns being tested (`symbol`).

These views are continuously updated as new data streams in, and allow you to monitor failing rows **as soon as** an assertion fails. You can use this feature for unit testing during the development of your dbt models, and later in production to trigger real-time alerts downstream.

## Materialize

To connect to the running Materialize service, you can use `mzcli`, which is included in the setup:
To connect to the running Materialize service, you can use a PostgreSQL-compatible client like `psql`, which is bundled in the `materialize/cli` image:

```bash
docker-compose run mzcli
docker-compose run cli
```

and run a few commands to check the objects created through dbt:
Expand Down Expand Up @@ -99,6 +137,30 @@ SHOW MATERIALIZED VIEWS;

You'll notice that you're only able to `SELECT` from `avg_bid` — this is because it is the only materialized view! This view is incrementally updated as new data streams in, so you get fresh and correct results with low latency. Behind the scenes, Materialize is indexing the results of the embedded query in memory.

### Continuous testing

To validate that the schema storing the tests was created:

```sql
SHOW SCHEMAS;
name
--------------------
public
public_etl_failure
```

, and that the materialized views that continuously test the `avg_bid` view for failures are up and running:

```sql
SHOW VIEWS FROM public_etl_failure;
name
-------------------------
not_null_avg_bid_symbol
unique_avg_bid_symbol
```

## Local installation

To set up dbt and Materialize in your local environment instead of using Docker, follow the instructions in the [documentation](https://materialize.com/docs/guides/dbt/).
Expand Down
6 changes: 3 additions & 3 deletions dbt-get-started/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ services:
ports:
- 6875:6875
healthcheck: {test: curl -f localhost:6875, interval: 1s, start_period: 30s}
mzcli:
cli:
image: materialize/cli:v0.26.0
container_name: mzcli
container_name: cli
dbt:
image: materialize/dbt-materialize:v0.26.0
build: ./
container_name: dbt
init: true
entrypoint: /bin/bash
Expand Down
6 changes: 6 additions & 0 deletions dbt-get-started/dbt/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,9 @@ target-path: 'target' # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- 'target'
- 'dbt_modules'

tests:
mz_get_started:
marts:
+store_failures: true
+schema: 'etl_failure'
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{ config(materialized='materializedview') }}

SELECT symbol,
AVG(bid_price) AS avg
FROM {{ ref('market_orders') }}
AVG(bid_price) AS avg_bid
FROM {{ ref('stg_market_orders') }}
GROUP BY symbol
11 changes: 11 additions & 0 deletions dbt-get-started/dbt/models/marts/models.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 2

models:
- name: avg_bid
description: 'Computes the average bid price'
columns:
- name: symbol
description: 'The stock ticker'
tests:
- not_null
- unique
18 changes: 0 additions & 18 deletions dbt-get-started/dbt/models/schema.yml

This file was deleted.

7 changes: 7 additions & 0 deletions dbt-get-started/dbt/models/sources/sources.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
version: 2

sources:
- name: market_orders
schema: public
tables:
- name: market_orders_raw
5 changes: 5 additions & 0 deletions dbt-get-started/dbt/models/staging/models.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
version: 2

models:
- name: stg_market_orders
description: 'Converts market order data to proper data types'
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ SELECT
(text::jsonb)->>'symbol' AS symbol,
(text::jsonb)->>'trade_type' AS trade_type,
to_timestamp(((text::jsonb)->'timestamp')::bigint) AS ts
FROM {{ ref('market_orders_raw') }}
FROM {{ source('market_orders', 'market_orders_raw') }}

0 comments on commit 0fda761

Please sign in to comment.