Skip to content
This repository has been archived by the owner on Jul 2, 2024. It is now read-only.

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Lucy Sheppard committed Jul 2, 2024
1 parent 53bf111 commit 9671094
Show file tree
Hide file tree
Showing 16 changed files with 156,973 additions and 156,013 deletions.
66 changes: 61 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,61 @@
# 🚧 This repo has been archived
## 👇🏻 Please use one of the following options instead
- I want a[ dbt-focused Jaffle Shop project](https://jaffle.sh/) that works with dbt Cloud or dbt Core with any adapter or setup.
- I want a [fork of the repo that was here](https://github.com/meltano/jaffle-shop-template) maintained by Meltano.
- I want a [community-maintained DuckDB + dbt + Evidence focused project](https://github.com/gwenwindflower/octocatalog) stewarded by the original author of this repo [@gwenwindflower](https://github.com/gwenwindflower).
# 🥪 The Jaffle Shop 🦘

This is a template for creating a fully functional dbt project for teaching, learning, writing, demoing, or any other scenarios where you need a basic project with a synthesized jaffle shop business.

## How to use

### 1. Click the big green 'Use this template' button and 'Create a new repository'.

![Click use template](.github/static/use-template.gif)

This will create a new repository exactly like this one, and navigate you there. Make sure to execute the next instructions in that repo.

### 2. Click 'Code', then 'Codespaces, then 'Create codespace on main'.

![Create codespace on main](.github/static/open-codespace.gif)

This will create a new `codespace`, a sandboxed devcontainer with everything you need for a dbt project. Once the codespace is finished setting up, you'll be ready to run a `dbt build`.

### 3. Make sure to wait til the codespace is finished setting up.

![Codespaces setup screen at postCreateCommand](.github/static/codespaces-setup-screen.png)

After the container is built and connected to, VSCode will run a few clean up commands and then a `postCreateCommand`, a set of commands run after the container is set up. This is where we install our dependencies, such as dbt, the duckdb adapter, and other necessities, as well as run `dbt deps` to install the dbt packages we want to use. That screen will look something like the above, when its completed it will close and leave you in a fresh terminal prompt. From there you're ready to do some analytics engineering!

## Using with Meltano

This project is preconfigured with a Meltano configuration file, `meltano.yml`. Meltano can be used as follows:

One-time workstation setup:

```console
> meltano install # Install the plugins declared by the project
```

Sample usage for end-to-end development:

```console
> meltano run el # Run the job titled 'el' to extract and load data
> meltano run t # Run the job titled 't' to transform data
> meltano run bi # Build and serve the Evidence BI reports
```

Dynamically Build and serve the Evidence BI reports:

```
meltano invoke evidence:dev #
```

Do a full end-to-end build on "prod":

```console
> meltano --environment=prod run elt evidence:build
```

## Contributing

We welcome issues and PRs requesting or adding new features. The package that generates the synthetic data, [`jafgen`](https://pypi.org/project/jafgen/), is also under active development, and will add more types of source data to model as we go along. If you have tests, descriptions, new models, metrics, materializations types, or techniques you use this repo to demonstrate, which you feel would make for a more expansive baseline experience, we encourage you to consider contributing them back in so that this project becomes an even better collective tool for exploring and learning dbt over time.

## Anything else?

That's it! We jaff'd, we cried, we learned about life. If you have any questions or see missing documentation, that's also super helpful to contribute back in via an issue or PR.
7 changes: 0 additions & 7 deletions Taskfile.yml

This file was deleted.

1,881 changes: 942 additions & 939 deletions jaffle-data/raw_customers.csv

Large diffs are not rendered by default.

191,435 changes: 96,067 additions & 95,368 deletions jaffle-data/raw_items.csv

Large diffs are not rendered by default.

118,851 changes: 59,199 additions & 59,652 deletions jaffle-data/raw_orders.csv

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions jaffle-data/raw_stores.csv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
id,name,opened_at,tax_rate
7f790ed7-0fc4-4de2-a1b0-cce72e657fc4,Philadelphia,2016-09-01T00:00:00,0.06
08d44615-06d3-4086-a5d7-21395a1d975e,Brooklyn,2017-03-12T00:00:00,0.04
f6f2bd97-becb-4e1c-a611-20c7cf579841,Chicago,2018-04-29T00:00:00,0.0625
48b0172c-4490-4f05-b290-e69f418d0575,San Francisco,2018-05-09T00:00:00,0.075
ed2af26d-35a1-4a31-ac65-7aedcaa7b7a7,New Orleans,2019-03-10T00:00:00,0.04
74d66a05-2e08-41d6-b743-e5fe5ba754b4,Philadelphia,2016-09-01T00:00:00,0.06
de4cda82-d821-4a4f-85f1-d88a6ea32fc2,Brooklyn,2017-03-12T00:00:00,0.04
38e46ab3-e2c4-4453-81e6-fc4c4f4bfb11,Chicago,2018-04-29T00:00:00,0.0625
f62a04a4-0237-45bf-bdc2-ac78ff46c962,San Francisco,2018-05-09T00:00:00,0.075
73648058-5335-44b2-9438-65fb3533829d,New Orleans,2019-03-10T00:00:00,0.04
124 changes: 91 additions & 33 deletions meltano.yml
Original file line number Diff line number Diff line change
@@ -1,46 +1,104 @@
# Meltano Configuration File
#
# One-time workstation setup:
# > meltano install # Install the plugins declared by the project
#
# Sample usage:
# > meltano run tap-jaffle-shop target-duckdb
# Sample usage for end-to-end development:
# > meltano run el # Run the job titled 'el' to extract and load data
# > meltano run t # Run the job titled 't' to transform data
# > meltano run bi # Build and serve the Evidence BI reports
#
# Or equivalently:
# > meltano run el # Run the job named 'el' to extract and load data
# Repeat the same actions as above on "prod":
# > meltano --environment=prod run elt evidence:build

version: 1
project_id: Jaffle Shop Template Project

env:
JAFFLE_DB_PATH: ./reports/jaffle_shop.duckdb
JAFFLE_DB_NAME: jaffle_shop
JAFFLE_RAW_SCHEMA: jaffle_raw
jobs:
# Sample usage: `meltano run el`, `meltano run t`, `meltano run el t`, `meltano run elt`
- name: el # Extract and load the raw data
tasks:
- tap-jaffle-shop target-duckdb
- name: t # Transform the raw data
tasks:
- dbt-duckdb:run
- dbt-duckdb:test
- name: elt # Extract, Load, and Transform
tasks:
- tap-jaffle-shop target-duckdb
- dbt-duckdb:run
- dbt-duckdb:test
- name: bi # Launch the Evidence BI dev environment
tasks:
- evidence:dev
- name: bi-compile # Build BI reports and test for breakages
tasks:
- evidence:build-strict
- name: full-build # End-to-end build and test
tasks:
- tap-jaffle-shop target-duckdb
- dbt-duckdb:run
- dbt-duckdb:test
- evidence:build-strict

default_environment: dev
environments:
- name: dev
- name: dev
env:
JAFFLE_DB_PATH: ${MELTANO_PROJECT_ROOT}/reports/jaffle_shop.${MELTANO_ENVIRONMENT}-duckdb
JAFFLE_DB_NAME: jaffle_shop
JAFFLE_RAW_SCHEMA: tap_jaffle_shop
TAP_JAFFLE_SHOP_YEARS: '1'
- name: staging
env:
JAFFLE_DB_PATH: ${MELTANO_PROJECT_ROOT}/reports/jaffle_shop.${MELTANO_ENVIRONMENT}-duckdb
JAFFLE_DB_NAME: jaffle_shop
JAFFLE_RAW_SCHEMA: tap_jaffle_shop
TAP_JAFFLE_SHOP_YEARS: '3'
- name: prod
env:
JAFFLE_DB_PATH: ${MELTANO_PROJECT_ROOT}/reports/jaffle_shop.${MELTANO_ENVIRONMENT}-duckdb
JAFFLE_DB_NAME: jaffle_shop
JAFFLE_RAW_SCHEMA: tap_jaffle_shop
TAP_JAFFLE_SHOP_YEARS: '5'

plugins:
extractors:
- name: tap-jaffle-shop
namespace: tap_jaffle_shop
variant: meltanolabs
pip_url: git+https://github.com/MeltanoLabs/tap-jaffle-shop.git@v0.3.0
capabilities:
- catalog
- discover
config:
years: 2
stream_name_prefix: ${JAFFLE_RAW_SCHEMA}-raw_
- name: tap-jaffle-shop
namespace: tap_jaffle_shop
variant: meltanolabs
pip_url: git+https://github.com/MeltanoLabs/tap-jaffle-shop.git@v0.2.1
capabilities:
- catalog
- discover
config:
years: 1
stream_name_prefix: ${JAFFLE_RAW_SCHEMA}-raw_
loaders:
- name: target-duckdb
variant: jwills
pip_url: target-duckdb~=0.4
config:
filepath: ${JAFFLE_DB_PATH}
default_target_schema: $JAFFLE_RAW_SCHEMA

jobs:
# Sample usage: `meltano run el`
# Equivalent to: `meltano run tap-jaffle-shop target-duckdb`
- name: el # Extract and load the raw data
tasks:
- tap-jaffle-shop target-duckdb
- name: target-duckdb
variant: jwills
pip_url: target-duckdb~=0.4
config:
filepath: ${JAFFLE_DB_PATH}
- name: target-parquet
variant: estrategiahq
pip_url: git+https://github.com/estrategiahq/target-parquet.git
utilities:
- name: dbt-duckdb
variant: jwills
pip_url: dbt-core~=1.4.5 dbt-duckdb~=1.4.0 git+https://github.com/meltano/[email protected]
config:
project_dir: ${MELTANO_PROJECT_ROOT}
profiles_dir: ${MELTANO_PROJECT_ROOT}
path: ${JAFFLE_DB_PATH}
- name: evidence
variant: meltanolabs
pip_url: evidence-ext>=0.5
commands:
dev: dev
config:
home_dir: ${MELTANO_PROJECT_ROOT}/reports
settings:
duckdb:
# filename: ${MELTANO_PROJECT_ROOT}/reports/${JAFFLE_DB_NAME}.${MELTANO_ENVIRONMENT}.duckdb
filename: ${JAFFLE_DB_NAME}.${MELTANO_ENVIRONMENT}-duckdb
project_id: ff061732-bd27-4021-916f-e8f8b55fcf9d
5 changes: 4 additions & 1 deletion models/staging/__sources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@ sources:
schema: "{{ env_var('JAFFLE_RAW_SCHEMA', 'jaffle_raw') }}"
description: E-commerce data
meta:
external_location: "read_csv_auto('./jaffle-data/{name}.csv', header=1)"
# If `$JAFFLE_RAW_SCHEMA` is specified, use the provided raw data. Otherwise, use the csv seed data from the repo.
external_location: >-
{{ '' if env_var('JAFFLE_RAW_SCHEMA', '') else 'read_csv_auto("./jaffle-data/{name}.csv", header=1)' }}
tables:
- name: raw_customers
description: One record per person who has purchased one or more items
Expand Down
2 changes: 1 addition & 1 deletion packages.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
packages:
- package: dbt-labs/metrics
version: 1.5.0
version: 1.4.0
- package: dbt-labs/dbt_utils
version: 1.0.0
95 changes: 95 additions & 0 deletions plugins/loaders/target-duckdb--jwills.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
{
"plugin_type": "loaders",
"name": "target-duckdb",
"namespace": "target_duckdb",
"variant": "jwills",
"label": "DuckDB",
"docs": "https://hub.meltano.com/loaders/target-duckdb--jwills",
"repo": "https://github.com/jwills/target-duckdb",
"pip_url": "target-duckdb~=0.4",
"description": "DuckDB loader",
"logo_url": "https://hub.meltano.com/assets/logos/loaders/duckdb.png",
"settings_group_validation": [
[
"filepath",
"default_target_schema"
]
],
"settings": [
{
"name": "filepath",
"kind": "string",
"label": "File Path",
"description": "Path to the local DuckDB file.",
"placeholder": "/path/to/local/duckdb.file"
},
{
"name": "batch_size_rows",
"kind": "integer",
"value": 100000,
"label": "Batch Size Rows",
"description": "Maximum number of rows in each batch. At the end of each batch, the rows in the batch are loaded into DuckDB."
},
{
"name": "flush_all_streams",
"kind": "boolean",
"value": false,
"label": "Flush All Streams",
"description": "Flush and load every stream into DuckDB when one batch is full. Warning - This may trigger the COPY command to use files with low number of records."
},
{
"name": "default_target_schema",
"kind": "string",
"value": "$MELTANO_EXTRACT__LOAD_SCHEMA",
"label": "Default Target Schema",
"description": "Name of the schema where the tables will be created. If schema_mapping is not defined then every stream sent by the tap is loaded into this schema."
},
{
"name": "schema_mapping",
"kind": "object",
"label": "schema_mapping",
"description": "Useful if you want to load multiple streams from one tap to multiple DuckDB schemas.\n\nIf the tap sends the stream_id in <schema_name>-<table_name> format then this option overwrites the default_target_schema value.\n"
},
{
"name": "add_metadata_columns",
"kind": "boolean",
"value": false,
"label": "Add Metadata Columns",
"description": "Metadata columns add extra row level information about data ingestions, (i.e. when was the row read in source, when was inserted or deleted in postgres etc.) Metadata columns are creating automatically by adding extra columns to the tables with a column prefix _SDC_. The column names are following the stitch naming conventions documented at https://www.stitchdata.com/docs/data-structure/integration-schemas#sdc-columns. Enabling metadata columns will flag the deleted rows by setting the _SDC_DELETED_AT metadata column. Without the add_metadata_columns option the deleted rows from singer taps will not be recognisable in DuckDB."
},
{
"name": "hard_delete",
"kind": "boolean",
"value": false,
"label": "Hard Delete",
"description": "When hard_delete option is true then DELETE SQL commands will be performed in DuckDB to delete rows in tables. It's achieved by continuously checking the _SDC_DELETED_AT metadata column sent by the singer tap. Due to deleting rows requires metadata columns, hard_delete option automatically enables the add_metadata_columns option as well."
},
{
"name": "data_flattening_max_level",
"kind": "integer",
"value": 0,
"label": "Data Flattening Max Level",
"description": "Object type RECORD items from taps can be transformed to flattened columns by creating columns automatically.\n\nWhen value is 0 (default) then flattening functionality is turned off.\n"
},
{
"name": "primary_key_required",
"kind": "boolean",
"value": true,
"label": "Primary Key Required",
"description": "Log based and Incremental replications on tables with no Primary Key cause duplicates when merging UPDATE events. When set to true, stop loading data if no Primary Key is defined."
},
{
"name": "validate_records",
"kind": "boolean",
"value": false,
"label": "Validate Records",
"description": "Validate every single record message to the corresponding JSON schema. This option is disabled by default and invalid RECORD messages will fail only at load time by DuckDB. Enabling this option will detect invalid records earlier but could cause performance degradation."
},
{
"name": "temp_dir",
"kind": "string",
"label": "Temporary Directory",
"description": "Directory of temporary CSV files with RECORD messages."
}
]
}
47 changes: 47 additions & 0 deletions plugins/loaders/target-parquet--estrategiahq.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"plugin_type": "loaders",
"name": "target-parquet",
"namespace": "target_parquet",
"variant": "estrategiahq",
"label": "Parquet",
"docs": "https://hub.meltano.com/loaders/target-parquet--estrategiahq",
"repo": "https://github.com/estrategiahq/target-parquet",
"pip_url": "git+https://github.com/estrategiahq/target-parquet.git",
"description": "Columnar Storage Format",
"logo_url": "https://hub.meltano.com/assets/logos/loaders/parquet.png",
"settings": [
{
"name": "disable_collection",
"kind": "boolean",
"label": "Disable Collection",
"description": "A boolean of whether to disable Singer anonymous tracking."
},
{
"name": "logging_level",
"label": "Logging Level",
"description": "(Default - INFO) The log level. Can also be set using environment variables."
},
{
"name": "destination_path",
"label": "Destination Path",
"description": "(Default - '.') The path to write files out to."
},
{
"name": "compression_method",
"label": "Compression Method",
"description": "Compression methods have to be supported by Pyarrow, and currently the compression modes available are - snappy (recommended), zstd, brotli and gzip."
},
{
"name": "streams_in_separate_folder",
"kind": "boolean",
"label": "Streams In Separate Folder",
"description": "(Default - False) The option to create each stream in a different folder, as these are expected to come in different schema."
},
{
"name": "file_size",
"kind": "integer",
"label": "File Size",
"description": "The number of rows to write per file. The default is to write to a single file."
}
]
}
Loading

0 comments on commit 9671094

Please sign in to comment.