Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign domains to entities in CaDeT #343

Closed
seanprivett opened this issue May 16, 2024 · 7 comments
Closed

Assign domains to entities in CaDeT #343

seanprivett opened this issue May 16, 2024 · 7 comments
Assignees

Comments

@seanprivett
Copy link
Contributor

seanprivett commented May 16, 2024

We would like the domains in the CaDeT metadata to be assigned to actual DataHub domains, rather than custom properties

https://datahubproject.io/docs/generated/ingestion/sources/dbt/#dbt-meta-automated-mappings

@seanprivett seanprivett converted this from a draft issue May 16, 2024
@murdo-moj
Copy link
Contributor

murdo-moj commented May 16, 2024

A thread in slack suggests using transformers to achieve this end, as the native dbt mappings don't have an add_domain utility

https://datahubspace.slack.com/archives/CUMUWQU66/p1674149180727029

https://datahubproject.io/docs/metadata-ingestion/docs/transformer/dataset_transformer/#domain-mapping-based-on-tags

@murdo-moj
Copy link
Contributor

murdo-moj commented May 16, 2024

There's an active feature request for

Add ability to specify domain when Ingest DBT metadata

@seanprivett seanprivett changed the title Spike: how does domain information get assigned to ingested CaDeT assets Ingest domain information from CaDeT May 23, 2024
@seanprivett seanprivett changed the title Ingest domain information from CaDeT Assign domains to entities in CaDeT May 23, 2024
@murdo-moj murdo-moj self-assigned this May 24, 2024
@murdo-moj murdo-moj moved this from Todo to In Progress in Data Catalogue May 24, 2024
@murdo-moj
Copy link
Contributor

I have this working with using the naming of tables to assign domains.

@murdo-moj
Copy link
Contributor

source:
    type: dbt
    config:
        manifest_path: 's3://mojap-derived-tables/prod/run_artefacts/latest/target/manifest.json'
        catalog_path: 's3://mojap-derived-tables/prod/run_artefacts/latest/target/catalog.json'
        test_results_path: 's3://mojap-derived-tables/prod/run_artefacts/latest/target/run_results.json'
        target_platform: athena
        infer_dbt_schemas: true
        aws_connection:
            aws_region: eu-west-1
        node_name_pattern:
            allow:
                - '.*bold_sm_spells.*'
                - '.*common_platform.*'
                - '.*sirius.*'
        entities_enabled:
            test_results: 'YES'
            seeds: 'YES'
            snapshots: 'YES'
            models: 'YES'
            sources: 'YES'
            test_definitions: 'YES'
        stateful_ingestion:
            remove_stale_metadata: true

transformers:
    - type: "pattern_add_dataset_domain"
      config:
        semantics: OVERWRITE
        domain_pattern:
          rules:
            'urn:li:dataset:\(urn:li:dataPlatform:dbt,awsdatacatalog.*common_platform.*': ["HMCTS"]
            'urn:li:dataset:\(urn:li:dataPlatform:dbt,awsdatacatalog.*prison.*': ["HMPPS"]
            'urn:li:dataset:\(urn:li:dataPlatform:dbt,awsdatacatalog.*sirius.*': ["OPG"]

@murdo-moj
Copy link
Contributor

murdo-moj commented May 29, 2024

This recipe is included in ministryofjustice/data-catalogue#123

@murdo-moj
Copy link
Contributor

murdo-moj commented May 29, 2024

Matt did a spike to pick up domains from CaDeT, from which we'd then map to our own domain model. #108

@murdo-moj murdo-moj moved this from In Progress to Review in Data Catalogue Jun 5, 2024
@murdo-moj
Copy link
Contributor

@murdo-moj murdo-moj moved this from Review to Done in Data Catalogue Jun 6, 2024
@murdo-moj murdo-moj closed this as completed by moving to Done in Data Catalogue Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done ✅
Development

No branches or pull requests

2 participants