Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
brabster authored Feb 17, 2024
0 parents commit 0e5c38c
Show file tree
Hide file tree
Showing 31 changed files with 713 additions and 0 deletions.
42 changes: 42 additions & 0 deletions .dev_scripts/init_and_update.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash

set -euo pipefail

PIP_REQUIRE_VIRTUALENV=true # have pip abort if we try to install outside a venv
PROJECT_DIR=$(dirname "$0")/.. # script directory
VENV_PATH=${PROJECT_DIR}/.venv
IS_RUNNING_IN_VENV="$(python -c 'import sys; print(sys.prefix != sys.base_prefix)')"

if [ "${IS_RUNNING_IN_VENV}" == 'False' ]; then
echo 'Not in virtualenv, setting up';
python -m venv ${VENV_PATH}
source ${VENV_PATH}/bin/activate
fi

echo "install or upgrade system packages"
pip install --upgrade pip setuptools

echo "install safety for vulnerability check; it prints its own messages about noncommercial use"
pip install --upgrade safety

echo "install or upgrade project-specific dependencies"
pip install -U -r ${PROJECT_DIR}/requirements.txt

echo "install or upgrade dbt dependencies"
dbt deps

echo "check for vulnerabilities"
safety check

echo "load user environment, if present"
ENV_PATH=${PROJECT_DIR}/.env
if [ -f "${ENV_PATH}" ]; then
source ${ENV_PATH}
echo "check dbt setup"
dbt debug
else
echo "Unable to check dbt setup until .env file is set up and suitable data warehouse credentials are available"
fi



5 changes: 5 additions & 0 deletions .env_template
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# copy this file to gitignored `.env` and set the environment for your personal workspace

export DBT_DATASET=sandbox_your_name
export DBT_LOCATION=US # example dataset is in US location
export DBT_PROJECT=some-project-id # must be the GCP project id, not the project name!
1 change: 1 addition & 0 deletions .envs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This directory contains environment-specific configurations for use in pipeline deployment.
4 changes: 4 additions & 0 deletions .envs/prod/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# these values are used to deploy the example environment for the template repository. Update for your target environment
export DBT_DATASET=dbt_bigquery_template_example
export DBT_LOCATION=US # because the source example data is in the US location
export DBT_PROJECT=pypi-408816
30 changes: 30 additions & 0 deletions .envs/prod/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
DBT does not directly manage datasets/schemas and their permissions.

If you want to manage your dataset ACL as part of the build,
you can provide a JSON document describing the permissions you want as dataset_acl.json
and uncomment the commented-out `bq update` command in the workflow file dataset job.

See https://cloud.google.com/bigquery/docs/control-access-to-resources-iam#grant_access_to_a_dataset

```json
{
"access": [

{
"role": "READER",
"specialGroup": "projectReaders"
},
{
"role": "WRITER",
"specialGroup": "projectWriters"
},
{
"role": "OWNER",
"specialGroup": "projectOwners"
}
]
}
```

Terraform is the other obvious option to manage datasets, but this adds complexity and a new toolset/supply chain

99 changes: 99 additions & 0 deletions .github/GCP_WIF_SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
Based on https://cloud.google.com/blog/products/identity-security/enabling-keyless-authentication-from-github-actions

Setting up a Workload Identity Federation for GitHub action.
Assumes $DBT_PROJECT is set to the project you want the pool/provider in.

# Setup WIF in-project

Unsure whether setting up a WIF pool/provider for each project is the best way, but it seems like the least risky.

## Gather some info

```console
export WIF_PROJECT_NUMBER=$(gcloud projects describe "${DBT_PROJECT}" --format="value(projectNumber)")
export WIF_POOL=dbt-pool
export WIF_PROVIDER=dbt-provider
export WIF_GITHUB_REPO=$(git remote get-url origin|cut -d: -f2|cut -d. -f1)
export WIF_SERVICE_ACCOUNT=pypi-vulnerabilities
```
## Ensure IAM APIs enabled

```console
gcloud services enable iamcredentials.googleapis.com --project "${DBT_PROJECT}"
```

## Setup Service Account

```console
gcloud iam service-accounts create "${WIF_SERVICE_ACCOUNT}" \
--project="${DBT_PROJECT}" \
--description="DBT service account" \
--display-name="${WIF_SERVICE_ACCOUNT}"
```

## Setup Workload Identity Provider

```console
gcloud iam workload-identity-pools create "${WIF_POOL}" \
--project="${DBT_PROJECT}" \
--location="global" \
--display-name="DBT Pool"
```

```console
gcloud iam workload-identity-pools providers create-oidc "${WIF_PROVIDER}" \
--project="${DBT_PROJECT}" \
--location="global" \
--workload-identity-pool="${WIF_POOL}" \
--display-name="DBT provider" \
--attribute-mapping="google.subject=assertion.sub,attribute.actor=assertion.actor,attribute.repository=assertion.repository" \
--issuer-uri="https://token.actions.githubusercontent.com"
```

## Collect up IDs of the Workload Identity Pool and Provider

```console
export WIF_POOL_PROVIDER_ID=$(gcloud iam workload-identity-pools providers describe "${WIF_PROVIDER}" --location=global --project "${DBT_PROJECT}" --workload-identity-pool "${WIF_POOL}" --format="value(name)")
export WIF_POOL_ID=$(gcloud iam workload-identity-pools describe "${WIF_POOL}" --location=global --project "${DBT_PROJECT}" --format="value(name)")
```

## Setup IAM to allow GitHub to assume role

```console
gcloud iam service-accounts add-iam-policy-binding "${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" \
--project="${DBT_PROJECT}" \
--role="roles/iam.workloadIdentityUser" \
--member="principalSet://iam.googleapis.com/${WIF_POOL_ID}/attribute.repository/${WIF_GITHUB_REPO}"
```

```console
gcloud iam service-accounts add-iam-policy-binding "${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" \
--project="${DBT_PROJECT}" \
--role="roles/iam.serviceAccountTokenCreator" \
--member="serviceAccount:${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com"
```

## Grant Service Account BigQuery admin in the project

(You may need to make this policy more specific!)

```console
gcloud projects add-iam-policy-binding "${DBT_PROJECT}" \
--role="roles/bigquery.admin" \
--member="serviceAccount:${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com"
```

## Recover Secrets for GitHub

Populate secrets for this build as described below

```console
echo "GitHub Secret: GCP_WORKLOAD_IDENTITY_PROVIDER"
gcloud iam workload-identity-pools providers describe "${WIF_PROVIDER}" --location=global --project "${DBT_PROJECT}" --workload-identity-pool "${WIF_POOL}" --format="value(name)"
```

```console
echo "GitHub Secret: GCP_SERVICE_ACCOUNT"
echo "${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com"
```

31 changes: 31 additions & 0 deletions .github/actions/dbt_build/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@

name: dbt build in venv
description: Runs dbt build from venv
inputs:
env:
required: true
description: Environment file to source
runs:
using: composite
steps:
- name: dbt build for ${{ inputs.env }}
shell: bash
run: |
source .venv/bin/activate
source .envs/${{ inputs.env }}/.env
rm -rf logs
dbt clean
dbt deps
dbt debug
dbt build
dbt docs generate
- name: upload target artifacts
uses: actions/upload-artifact@v3
with:
name: dbt_artifacts_${{ inputs.env }}
path: |
target
logs
24 changes: 24 additions & 0 deletions .github/actions/setup_dbt/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@

name: Setup DBT in virtualenv
description: Sets up environment suitable for DBT
runs:
using: composite
steps:
- uses: actions/setup-python@v5
with:
python-version: '3.11' # dbt does not support 3.12 yet
check-latest: true
- name: setup-python-venv
shell: bash
run: |
python --version
python -m venv .venv
source .venv/bin/activate
pip install -U setuptools pip safety
pip install -U -r requirements.txt
- name: safety-check
shell: bash
run: |
source .venv/bin/activate
safety check
39 changes: 39 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: deploy-to-gcp
on:
push: {}
jobs:
deploy:
runs-on: ubuntu-latest
env:
PIP_REQUIRE_VIRTUALENV: true
permissions:
contents: read
id-token: write
actions: read
pages: write
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/setup_dbt
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
service_account: ${{ secrets.GCP_SERVICE_ACCOUNT }}
- uses: google-github-actions/setup-gcloud@v2
with:
version: '>= 363.0.0'
- name: ensure prod dataset exists
run: |
source .venv/bin/activate
source .envs/prod/.env
dbt deps
dbt run-operation ensure_target_dataset_exists
# bq update --source .envs/prod/dataset_acl.json "${DBT_PROJECT}:${DBT_DATASET}"
- name: prod dbt build
uses: ./.github/actions/dbt_build
with:
env: prod
- uses: actions/upload-pages-artifact@v3
with:
path: target
- uses: actions/deploy-pages@v4

19 changes: 19 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Python virtualenv files
/.venv/

# User's environment settings
/.env

# DBT logs
/logs/

# DBT target dir
/target/

# DBT packages
/dbt_packages/
/package-lock.yml

# files that we don't want committed
/uncommitted/*
!/uncommitted/README.md
15 changes: 15 additions & 0 deletions .vscode/tasks.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
// See https://go.microsoft.com/fwlink/?LinkId=733558
// for the documentation about the tasks.json format
"version": "2.0.0",
"tasks": [
{
"label": "init_and_update",
"type": "shell",
"command": "${workspaceFolder}/.dev_scripts/init_and_update.sh",
"runOptions": {
"runOn": "folderOpen"
}
}
]
}
Loading

0 comments on commit 0e5c38c

Please sign in to comment.