generated from brabster/dbt_bigquery_template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 0e5c38c
Showing
31 changed files
with
713 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
#!/bin/bash | ||
|
||
set -euo pipefail | ||
|
||
PIP_REQUIRE_VIRTUALENV=true # have pip abort if we try to install outside a venv | ||
PROJECT_DIR=$(dirname "$0")/.. # script directory | ||
VENV_PATH=${PROJECT_DIR}/.venv | ||
IS_RUNNING_IN_VENV="$(python -c 'import sys; print(sys.prefix != sys.base_prefix)')" | ||
|
||
if [ "${IS_RUNNING_IN_VENV}" == 'False' ]; then | ||
echo 'Not in virtualenv, setting up'; | ||
python -m venv ${VENV_PATH} | ||
source ${VENV_PATH}/bin/activate | ||
fi | ||
|
||
echo "install or upgrade system packages" | ||
pip install --upgrade pip setuptools | ||
|
||
echo "install safety for vulnerability check; it prints its own messages about noncommercial use" | ||
pip install --upgrade safety | ||
|
||
echo "install or upgrade project-specific dependencies" | ||
pip install -U -r ${PROJECT_DIR}/requirements.txt | ||
|
||
echo "install or upgrade dbt dependencies" | ||
dbt deps | ||
|
||
echo "check for vulnerabilities" | ||
safety check | ||
|
||
echo "load user environment, if present" | ||
ENV_PATH=${PROJECT_DIR}/.env | ||
if [ -f "${ENV_PATH}" ]; then | ||
source ${ENV_PATH} | ||
echo "check dbt setup" | ||
dbt debug | ||
else | ||
echo "Unable to check dbt setup until .env file is set up and suitable data warehouse credentials are available" | ||
fi | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# copy this file to gitignored `.env` and set the environment for your personal workspace | ||
|
||
export DBT_DATASET=sandbox_your_name | ||
export DBT_LOCATION=US # example dataset is in US location | ||
export DBT_PROJECT=some-project-id # must be the GCP project id, not the project name! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
This directory contains environment-specific configurations for use in pipeline deployment. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# these values are used to deploy the example environment for the template repository. Update for your target environment | ||
export DBT_DATASET=dbt_bigquery_template_example | ||
export DBT_LOCATION=US # because the source example data is in the US location | ||
export DBT_PROJECT=pypi-408816 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
DBT does not directly manage datasets/schemas and their permissions. | ||
|
||
If you want to manage your dataset ACL as part of the build, | ||
you can provide a JSON document describing the permissions you want as dataset_acl.json | ||
and uncomment the commented-out `bq update` command in the workflow file dataset job. | ||
|
||
See https://cloud.google.com/bigquery/docs/control-access-to-resources-iam#grant_access_to_a_dataset | ||
|
||
```json | ||
{ | ||
"access": [ | ||
|
||
{ | ||
"role": "READER", | ||
"specialGroup": "projectReaders" | ||
}, | ||
{ | ||
"role": "WRITER", | ||
"specialGroup": "projectWriters" | ||
}, | ||
{ | ||
"role": "OWNER", | ||
"specialGroup": "projectOwners" | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Terraform is the other obvious option to manage datasets, but this adds complexity and a new toolset/supply chain | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
Based on https://cloud.google.com/blog/products/identity-security/enabling-keyless-authentication-from-github-actions | ||
|
||
Setting up a Workload Identity Federation for GitHub action. | ||
Assumes $DBT_PROJECT is set to the project you want the pool/provider in. | ||
|
||
# Setup WIF in-project | ||
|
||
Unsure whether setting up a WIF pool/provider for each project is the best way, but it seems like the least risky. | ||
|
||
## Gather some info | ||
|
||
```console | ||
export WIF_PROJECT_NUMBER=$(gcloud projects describe "${DBT_PROJECT}" --format="value(projectNumber)") | ||
export WIF_POOL=dbt-pool | ||
export WIF_PROVIDER=dbt-provider | ||
export WIF_GITHUB_REPO=$(git remote get-url origin|cut -d: -f2|cut -d. -f1) | ||
export WIF_SERVICE_ACCOUNT=pypi-vulnerabilities | ||
``` | ||
## Ensure IAM APIs enabled | ||
|
||
```console | ||
gcloud services enable iamcredentials.googleapis.com --project "${DBT_PROJECT}" | ||
``` | ||
|
||
## Setup Service Account | ||
|
||
```console | ||
gcloud iam service-accounts create "${WIF_SERVICE_ACCOUNT}" \ | ||
--project="${DBT_PROJECT}" \ | ||
--description="DBT service account" \ | ||
--display-name="${WIF_SERVICE_ACCOUNT}" | ||
``` | ||
|
||
## Setup Workload Identity Provider | ||
|
||
```console | ||
gcloud iam workload-identity-pools create "${WIF_POOL}" \ | ||
--project="${DBT_PROJECT}" \ | ||
--location="global" \ | ||
--display-name="DBT Pool" | ||
``` | ||
|
||
```console | ||
gcloud iam workload-identity-pools providers create-oidc "${WIF_PROVIDER}" \ | ||
--project="${DBT_PROJECT}" \ | ||
--location="global" \ | ||
--workload-identity-pool="${WIF_POOL}" \ | ||
--display-name="DBT provider" \ | ||
--attribute-mapping="google.subject=assertion.sub,attribute.actor=assertion.actor,attribute.repository=assertion.repository" \ | ||
--issuer-uri="https://token.actions.githubusercontent.com" | ||
``` | ||
|
||
## Collect up IDs of the Workload Identity Pool and Provider | ||
|
||
```console | ||
export WIF_POOL_PROVIDER_ID=$(gcloud iam workload-identity-pools providers describe "${WIF_PROVIDER}" --location=global --project "${DBT_PROJECT}" --workload-identity-pool "${WIF_POOL}" --format="value(name)") | ||
export WIF_POOL_ID=$(gcloud iam workload-identity-pools describe "${WIF_POOL}" --location=global --project "${DBT_PROJECT}" --format="value(name)") | ||
``` | ||
|
||
## Setup IAM to allow GitHub to assume role | ||
|
||
```console | ||
gcloud iam service-accounts add-iam-policy-binding "${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" \ | ||
--project="${DBT_PROJECT}" \ | ||
--role="roles/iam.workloadIdentityUser" \ | ||
--member="principalSet://iam.googleapis.com/${WIF_POOL_ID}/attribute.repository/${WIF_GITHUB_REPO}" | ||
``` | ||
|
||
```console | ||
gcloud iam service-accounts add-iam-policy-binding "${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" \ | ||
--project="${DBT_PROJECT}" \ | ||
--role="roles/iam.serviceAccountTokenCreator" \ | ||
--member="serviceAccount:${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" | ||
``` | ||
|
||
## Grant Service Account BigQuery admin in the project | ||
|
||
(You may need to make this policy more specific!) | ||
|
||
```console | ||
gcloud projects add-iam-policy-binding "${DBT_PROJECT}" \ | ||
--role="roles/bigquery.admin" \ | ||
--member="serviceAccount:${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" | ||
``` | ||
|
||
## Recover Secrets for GitHub | ||
|
||
Populate secrets for this build as described below | ||
|
||
```console | ||
echo "GitHub Secret: GCP_WORKLOAD_IDENTITY_PROVIDER" | ||
gcloud iam workload-identity-pools providers describe "${WIF_PROVIDER}" --location=global --project "${DBT_PROJECT}" --workload-identity-pool "${WIF_POOL}" --format="value(name)" | ||
``` | ||
|
||
```console | ||
echo "GitHub Secret: GCP_SERVICE_ACCOUNT" | ||
echo "${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
|
||
name: dbt build in venv | ||
description: Runs dbt build from venv | ||
inputs: | ||
env: | ||
required: true | ||
description: Environment file to source | ||
runs: | ||
using: composite | ||
steps: | ||
- name: dbt build for ${{ inputs.env }} | ||
shell: bash | ||
run: | | ||
source .venv/bin/activate | ||
source .envs/${{ inputs.env }}/.env | ||
rm -rf logs | ||
dbt clean | ||
dbt deps | ||
dbt debug | ||
dbt build | ||
dbt docs generate | ||
- name: upload target artifacts | ||
uses: actions/upload-artifact@v3 | ||
with: | ||
name: dbt_artifacts_${{ inputs.env }} | ||
path: | | ||
target | ||
logs | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
|
||
name: Setup DBT in virtualenv | ||
description: Sets up environment suitable for DBT | ||
runs: | ||
using: composite | ||
steps: | ||
- uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.11' # dbt does not support 3.12 yet | ||
check-latest: true | ||
- name: setup-python-venv | ||
shell: bash | ||
run: | | ||
python --version | ||
python -m venv .venv | ||
source .venv/bin/activate | ||
pip install -U setuptools pip safety | ||
pip install -U -r requirements.txt | ||
- name: safety-check | ||
shell: bash | ||
run: | | ||
source .venv/bin/activate | ||
safety check | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
name: deploy-to-gcp | ||
on: | ||
push: {} | ||
jobs: | ||
deploy: | ||
runs-on: ubuntu-latest | ||
env: | ||
PIP_REQUIRE_VIRTUALENV: true | ||
permissions: | ||
contents: read | ||
id-token: write | ||
actions: read | ||
pages: write | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: ./.github/actions/setup_dbt | ||
- uses: google-github-actions/auth@v2 | ||
with: | ||
workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }} | ||
service_account: ${{ secrets.GCP_SERVICE_ACCOUNT }} | ||
- uses: google-github-actions/setup-gcloud@v2 | ||
with: | ||
version: '>= 363.0.0' | ||
- name: ensure prod dataset exists | ||
run: | | ||
source .venv/bin/activate | ||
source .envs/prod/.env | ||
dbt deps | ||
dbt run-operation ensure_target_dataset_exists | ||
# bq update --source .envs/prod/dataset_acl.json "${DBT_PROJECT}:${DBT_DATASET}" | ||
- name: prod dbt build | ||
uses: ./.github/actions/dbt_build | ||
with: | ||
env: prod | ||
- uses: actions/upload-pages-artifact@v3 | ||
with: | ||
path: target | ||
- uses: actions/deploy-pages@v4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Python virtualenv files | ||
/.venv/ | ||
|
||
# User's environment settings | ||
/.env | ||
|
||
# DBT logs | ||
/logs/ | ||
|
||
# DBT target dir | ||
/target/ | ||
|
||
# DBT packages | ||
/dbt_packages/ | ||
/package-lock.yml | ||
|
||
# files that we don't want committed | ||
/uncommitted/* | ||
!/uncommitted/README.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
// See https://go.microsoft.com/fwlink/?LinkId=733558 | ||
// for the documentation about the tasks.json format | ||
"version": "2.0.0", | ||
"tasks": [ | ||
{ | ||
"label": "init_and_update", | ||
"type": "shell", | ||
"command": "${workspaceFolder}/.dev_scripts/init_and_update.sh", | ||
"runOptions": { | ||
"runOn": "folderOpen" | ||
} | ||
} | ||
] | ||
} |
Oops, something went wrong.