Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - workflow to run a CaDeT ingestion #378

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions .github/workflows/ingest-cadet.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: "Ingest DBT metadata from Create a Derived Table"

permissions:
id-token: write
contents: read

on:
workflow_call:
inputs:
env:
description: "which environment to deploy to"
required: true
type: string
ecr_region:
description: "ecr region to connect to"
required: false
type: string
default: eu-west-1

jobs:
main:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v2
with:
python-version: 3.11.1
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.CADET_METADATA_ROLE_TO_ASSUME }}
aws-region: ${{ inputs.ECR_REGION }}
- name: install reqs
run: pip install acryl-datahub
- name: push metadata to datahub
env:
DATAHUB_GMS_TOKEN: ${{ secrets.CATALOGUE_TOKEN }}
DATAHUB_GMS_URL: ${{ vars.CATALOGUE_URL }}
run: |
datahub init
datahub ingest -c ingestion/cadet.yaml
27 changes: 27 additions & 0 deletions ingestion/cadet.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
source:
type: dbt
config:
manifest_path: "s3://mojap-derived-tables/prod/run_artefacts/latest/target/manifest.json"
catalog_path: "s3://mojap-derived-tables/prod/run_artefacts/latest/target/catalog.json"
test_results_path: "s3://mojap-derived-tables/prod/run_artefacts/latest/target/run_results.json"
platform_instance: cadet
target_platform: athena
target_platform_instance: athena_cadet
infer_dbt_schemas: true
entities_enabled:
test_results: true
seeds: false
snapshots: true
models: true
sources: true
test_definitions: true
stateful_ingestion:
remove_stale_metadata: true
transformers:
- type: pattern_add_dataset_domain
config:
semantics: OVERWRITE
domain_pattern:
rules:
'urn:li:dataset:\(urn:li:dataPlatform:dbt,cadet\.awsdatacatalog\.courts.*':
[courts]
Loading