Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs improvements #115

Merged
merged 32 commits into from
Sep 5, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
28f256b
update readme
graciegoheen Aug 16, 2023
f27587b
update main docs page
graciegoheen Aug 16, 2023
1fdda5e
add template examples page
graciegoheen Aug 16, 2023
db35ab7
initial create-group example
graciegoheen Aug 17, 2023
2899aa7
add version example
graciegoheen Aug 17, 2023
f524125
add add-contract example
graciegoheen Aug 17, 2023
0aed2ab
add note on connection multiple pre-existing projects
graciegoheen Aug 17, 2023
e64ee54
finish create-group example
graciegoheen Aug 17, 2023
4d32714
add group example
graciegoheen Aug 17, 2023
ffd9717
started on split example
graciegoheen Aug 18, 2023
60fefce
add split example
graciegoheen Aug 18, 2023
228ea6d
start connect example
graciegoheen Aug 18, 2023
8cee281
first pass at connect docs
graciegoheen Aug 18, 2023
ece1a72
Merge branch 'main' into docs_improvements
graciegoheen Aug 29, 2023
89a3133
fix merge conflict
graciegoheen Aug 29, 2023
f103d78
add note about leaf nodes to split command example
graciegoheen Aug 29, 2023
8fc78b8
updated group commands to "protected" access
graciegoheen Aug 29, 2023
ab70e42
Update public -> protected for group command
graciegoheen Aug 29, 2023
df25d96
docs updates
graciegoheen Aug 29, 2023
2be23c0
Merge branch 'docs_improvements' of github.com:dbt-labs/dbt-meshify i…
graciegoheen Aug 29, 2023
17ded0b
Update selector syntax flags
graciegoheen Aug 29, 2023
85d9d02
add "What dbt-meshify does not handle" section
graciegoheen Aug 29, 2023
f44b65b
update index and readme
graciegoheen Sep 5, 2023
21b6b3d
Update docs/examples.md
graciegoheen Sep 5, 2023
390c0af
add group clarification
graciegoheen Sep 5, 2023
798e25f
add create-path flag callout
graciegoheen Sep 5, 2023
d3adf95
Update docs/examples.md
graciegoheen Sep 5, 2023
b40458a
Update docs/index.md
graciegoheen Sep 5, 2023
32ac6b4
Merge branch 'main' into docs_improvements
graciegoheen Sep 5, 2023
743358e
add missing screenshots
graciegoheen Sep 5, 2023
410dedd
add example page, apply formatting fixes
dave-connors-3 Sep 5, 2023
3fe8f5a
Merge branch 'main' into docs_improvements
dave-connors-3 Sep 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 6 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,16 @@ These dbt-core features include:
3. __[Access](https://docs.getdbt.com/docs/collaborate/govern/model-access)__ - control the `access` level of models within groups
4. __[Versions](https://docs.getdbt.com/docs/collaborate/govern/model-versions)__ - create and increment versions of particular models.

Additionally, `dbt-meshify` automates the code development required to split a monolithic dbt project into component projects, or connect multiple pre-existing dbt projects using cross-project `ref`.
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved

## Installation

To install dbt-meshify, run:
```bash
pip install dbt-meshify
```

## Basic Usage

To upgrade dbt-meshify, run:
```bash
# create a group of all models tagged with "finance"
# leaf nodes and nodes with cross-group dependencies will be `public`
# public nodes will also have contracts added to them
dbt-meshify group finance --owner-name "Monopoly Man" -s +tag:finance

# optionally use the add-version operation to add a new version to a model
dbt-meshify operation add-version -s fct_orders
```
pip install --upgrade dbt-meshify
```
145 changes: 145 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Examples

For consistency and clarity of the following examples, we're going to use a simplified dbt project. In practice, the model governance features describe are _most_ beneficial for large dbt projects that are struggling to scale.

We will give a basic example for each command, but to see the full list of additional flags you can add to a given command, check out the [commands page](commands.md).

!!! note
One helpful flag that you can add to all of the commands is `--read-catalog`, which will skip the `dbt docs generate` step and instead read the local `catalog.json` file - this will speed up the time it takes to run the `dbt-meshify` commands but relies on your local `catalog.json` file being up-to-date.
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved

Let's imagine a dbt project with the following models:
![dbt dag of models](https://github.com/dave-connors-3/barnold-corp/assets/53586774/3775c540-ddc1-4eae-8587-8a0a9fb48c79)
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved

## Component commands

### Create a new group

Let's say you want to create a new group for your sales analytics models.
![create a new group for sales analytics models](https://github.com/dave-connors-3/barnold-corp/assets/53586774/0f2b03a2-c5da-4e70-81c7-e83084ee9ba1)

You can run the following command:
```bash
dbt-meshify operation create-group sales_analytics --owner-name Ralphie --select +int_sales__unioned +int_returns__unioned transactions
```

This will create a new group named "sales_analytics" with the owner "Ralphie" and add all selected models to that group with the appropriate `access` configuration:
- create a new group definition in a `_groups.yml` file
![yml file with group defition](https://github.com/dave-connors-3/barnold-corp/assets/53586774/b3fa812a-157f-41b3-842d-c67e59f77298)
- add all selected models to that group with the appropriate `access` config
- all models that are only referenced by other models in their _same group_ will have `access: private`
![int_sales__unioned access set to private](https://github.com/dave-connors-3/barnold-corp/assets/53586774/481010bb-ceed-4feb-a46e-05c185fac4e4)
- all other models (those that are referenced by models _outside their group_ or are leaf nodes) will have `access: public`
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
![transactions access set to public](https://github.com/dave-connors-3/barnold-corp/assets/53586774/4c8665ac-d14c-424d-81e3-51c0bf12c701)
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved

### Add/increment model versions

Let's say you want to add a new version to the customers model, which is currently un-versioned.
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
![add a version to the customers model](https://github.com/dave-connors-3/barnold-corp/assets/53586774/e4097ca4-b6fa-4af4-b238-384a090573a7)

You can run the following command:
```bash
dbt-meshify operation add-version --select customers
```

This will add a version to the `customers` model:
- the `customers.sql` file will be renamed to `customers_v1.sql`
- the necessary version configurations will be created (or added to a pre-xisting `yml` file)
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
![yml file updated with version configs](https://github.com/dave-connors-3/barnold-corp/assets/53586774/c0b12ab7-904e-4590-84aa-7b602a91f53f)

### Add contract(s)

Let's say you want to add a new contract to the `stores` model, which is currently un-contracted.
![add a contract to the stores model](https://github.com/dave-connors-3/barnold-corp/assets/53586774/9eb48ce4-d6c2-4c79-a09f-0ff85cfccdcc)

You can run the following command:
```bash
dbt-meshify operation add-contract --select stores
```

This will add an enforced contract to the `stores` model:
- add a `contract` config and set `enforced: true`
![yml file updated with added contract config](https://github.com/dave-connors-3/barnold-corp/assets/53586774/bf1ba4e2-76a1-4a65-a0a9-7614487b7d6f)
- add every column's `name` and `data_type` if not already defined
![yml file updated with added column names and data_types](https://github.com/dave-connors-3/barnold-corp/assets/53586774/1d989396-2b07-48c5-bcf6-de7eaf02b928)

## Global commands

## Group together a subset of models

Let's say you want to group together your sales analytics models.
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
![group together sales analytics models](https://github.com/dave-connors-3/barnold-corp/assets/53586774/b192bf70-e854-46f6-be40-915eb48adbb3)

You can run the following command:
```bash
dbt-meshify group sales_analytics --owner-name Ralphie --select +int_sales__unioned +int_returns__unioned transactions
```

This will create a new group named "sales_analytics" with the owner "Ralphie", add all selected models to that group with the appropriate `access` configuration, _and add contracts to the public-identified models_:
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
- create a new group definition in a `_groups.yml` file
![yml file with group defition](https://github.com/dave-connors-3/barnold-corp/assets/53586774/b3fa812a-157f-41b3-842d-c67e59f77298)
- add all selected models to that group with the appropriate `access` config
- all models that are only referenced by other models in their _same group_ will have `access: private`
![int_sales__unioned access set to private](https://github.com/dave-connors-3/barnold-corp/assets/53586774/481010bb-ceed-4feb-a46e-05c185fac4e4)
- all other models (those that are referenced by models _outside their group_ or are leaf nodes) will have `access: public`
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
![transactions access set to public](https://github.com/dave-connors-3/barnold-corp/assets/53586774/4c8665ac-d14c-424d-81e3-51c0bf12c701)
- for all `public` models:
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
- add a `contract` config and set `enforced: true`
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
![yml file updated with added contract config for transactions model](https://github.com/dave-connors-3/barnold-corp/assets/53586774/d40cef1b-fbb8-4cc3-9be6-f782378164cf)
- add every column's `name` and `data_type` if not already defined
![yml file updated with added column names and data_types for transactions model](https://github.com/dave-connors-3/barnold-corp/assets/53586774/f6402db9-95f0-4dc3-bc17-5966e79811a4)

## Split out a new subproject

Let's say you want to split our your sales analytics models into a new subproject.
![split sales analytics models into a new subproject](https://github.com/dave-connors-3/barnold-corp/assets/53586774/402a5637-800e-4945-b2e0-5271f2bf2c25)

You can run the following command:
```bash
dbt-meshify split sales_analytics --select +int_sales__unioned +int_returns__unioned transactions
```

This will create a new subproject that contains the selected sales analytics models, configure the "edge" models to be `public` and contracted, and replace all dependencies in the downstream project on the upstreams's models with cross-project `ref`s:
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
- create a new subproject that contains the selected sales analytics models
![selected models moved to a subproject](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/e638d83e-eb24-4f1e-852d-2c058bfedb4f)
- add a `dependencies.yml` to the _downstream_ project (in our case, our new subproject is downstream of the original project because the `transactions` model depends on some of the models that remain in the original project - `stores` and `customers`)
![add dependencies.yml](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/65e47b65-30ca-475f-bfa7-fffb26d85e11)
- add `access: public` to all models in the upstream project that are referenced by models in the downstream project
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
![customers access set to public](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/9e110ca4-40c5-4013-ab89-773b59638320)
- for all `public` models:
- add a `contract` config and set `enforced: true`
![yml file updated with added contract config for stores model](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/800fc871-ce56-4e80-b746-8bd84aa05574)
- add every column's `name` and `data_type` if not already defined
![yml file updated with added column names and data_types for stores model](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/48d41e17-0ad3-4a31-863b-1a8646d1d7c9)
- replace any dependencies in the downstream project on the upstream's models with a cross-project `ref`
![refs to customers and stores replaced with cross-project ref](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/33de63e1-0579-4ac0-9ff4-22099d701b99)

## Connect multiple dbt projects

Let's look at a slightly modified version of the example we've been working with. Instead of a single dbt project, let's imagine you're starting with two separate dbt projects connected via the "source hack":
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
- project A contains the following models
![project A's dag of models](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/75771c9c-1fa4-4cc5-b9b9-380f39091031)
- project B contains the following models
![project B's dag of models](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/a94657e5-c9bc-4b8b-ada5-63887bfd0ba3)

We call this type of multi-project configuration the "source hack" because there are models generated by project A (`stores` and `customers`) that are defined as sources in project B.

Let's say we want to connect these two projects using model governance best practices and cross project `ref`s.

You can run the following command:
```bash
dbt-meshify connect --project-paths path/to/project_a path/to/project_b
```

This will make the upstream project a dependency for the downstream project, configure the "edge" models to be `public` and contracted, and replace all dependencies in the downstream project on the upstreams's models with cross-project `ref`s:
- add a `dependencies.yml` to the _downstream_ project (in our case, project B is downstream of project A because the `transactions` model depends on some of the models that are generated by project A - `stores` and `customers`)
TO DO: ADD SCREENSHOT ONCE BUG IS FIXED
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
- add `access: public` to all models in the upstream project that are referenced by models in the downstream project
![customers access set to public](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/9e110ca4-40c5-4013-ab89-773b59638320)
- for all `public` models:
- add a `contract` config and set `enforced: true`
![yml file updated with added contract config for stores model](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/800fc871-ce56-4e80-b746-8bd84aa05574)
- add every column's `name` and `data_type` if not already defined
![yml file updated with added column names and data_types for stores model](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/48d41e17-0ad3-4a31-863b-1a8646d1d7c9)
- replace any dependencies in the downstream project on the upstream's models with a cross-project `ref`
![customers and stores sources replaced with cross-project ref](https://github.com/dave-connors-3/mega-corp-big-co-inc/assets/53586774/24d72b99-fbf1-489d-bda8-ccaea267981b)
- remove unnecessary sources
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
44 changes: 25 additions & 19 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# dbt_meshify

`dbt-meshify` is a dbt-core plugin that automates the management and creation of dbt-core model governance features introduced in dbt-core v1.5. Each command in the package will leverage your dbt project metadata to create and/or edit the files in your project to properly configure the models in your project with these governance features.
`dbt-meshify` is a dbt-core plugin that automates the creation of dbt-core model governance features introduced in dbt-core v1.5. This package will leverage your dbt project metadata to create and/or edit the files in your project to properly configure the models in your project with these governance features.
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved

These dbt-core features include:

Expand All @@ -9,31 +9,37 @@ These dbt-core features include:
3. __[Access](https://docs.getdbt.com/docs/collaborate/govern/model-access)__ - control the `access` level of models within groups
4. __[Versions](https://docs.getdbt.com/docs/collaborate/govern/model-versions)__ - create and increment versions of particular models.

Additionally, `dbt-meshify` automates the code development required to split a monolithic dbt project into component projects, or connect multiple pre-existing dbt projects using cross-project `ref`.

This package leverages the dbt-core Python API to allow users to use standard dbt selection syntax for each of the commands in this package (unless otherwise noted). See details on each of the specific commands available on the [commands page](commands.md)

## Basic Usage
## Getting Started

This package helps automate the code development required for adding the dbt-core model governance features mentioned above.

The first question to ask yourself is "which of these features do I want to add to my project"? Do you want to add contracts, create a new group, split your monolithic dbt project in two? Your answer to this question will establish which `dbt-meshify` command is right for you!

This package consists of **component** and **global** commands - so you can decide how to best break apart your work.

Each of the available [commands page](commands.md) allows you to add one (or many) of the above features to a set of models specified by the selection syntax in the command.
The **component** commands allow you to do a single step at a time and begin with `dbt-meshify operation`. For example, if you wanted to add a new version to a model, you would run something like `dbt-meshify operation add-version --select fct_orders`. This command would:
1. add a new version to `fct_orders`

The goal of this package is to make it more straightforward to apply to your project so that splitting apart a monolithic project into component projects is a more automated, dbt-tonic experience.
and that's it!

The process of splitting a dbt monolith apart roughly requires you to:
The **global** commands combine _multiple_ **component** commands to complete a larger set of work and begin with `dbt-meshify`. For example, if you wanted to define a group for a subset of your models, you would run something like `dbt-meshify group finance --owner-name "Monopoly Man" --select +tag:finance`. This command would:
1. define a new group named "finance" in your dbt project, setting the owner name to "Monopoly Man"
2. add all models tagged with "finance" to that new group
3. set `access` to public for all leaf nodes and nodes with cross-group dependencies
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved
4. add contracts to all public nodes

1. Determine what parts of your project should be grouped together into subprojects
2. Determine the access-level for the members of that group
3. Add model contracts to the elements that are public and accessed my members outside the group specified in (1)
4. (Optional) Add model versions to the public models to allow for development without impacting downstream stakeholders.
all at once!

Here's how that might look for the process of creating a separate `finance` subproject in your dbt monolith.
The next question to ask yourself is "which of my models do I want to add these feature(s) to?". This informs the selection syntax you provide to the `dbt-meshify` command of choice. `dbt-meshify` uses the same selection syntax as `dbt`, so you can `--select` based on model names, tags, and so on!
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved

```bash
# create a group of all models tagged with "finance"
# leaf nodes and nodes with cross-group dependencies will be `public`
# public nodes will also have contracts added to them
dbt-meshify group finance --owner-name "Monopoly Man" -s +tag:finance
Once you've decided:
1. which governance feature(s) you want to add to your dbt project
2. which subset of models you want to add those feature(s) to

# optionally use the add-version operation to add a new version to a model
dbt-meshify operation add-version -s fct_orders
```
you're ready to use `dbt-meshify`!

Future releases of this package may also include features that allow users to fully split off groups of models into entirely new dbt projects.
For further information, check out the available [commands](commands.md) or read through some [examples](examples.md).