Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create and assign CaDeT domains #138

Merged
merged 14 commits into from
Jun 6, 2024
Merged

Conversation

murdo-moj
Copy link
Contributor

@murdo-moj murdo-moj commented Jun 4, 2024

  • Added a new source which creates domains from the CaDeT manifest
  • Added a transformer which builds upon the dbt ingestion and assigns those domains to CaDet models
    • The transformer has been adapted from the existing PatternAddDatasetDomain transformer written by DataHub.
    • A urn is formed for every dataset from the manifest which maps to a domain.
  • Domains in CaDeT are not assigned to sources, only models, so out of 505 datasets currently ingested, only ~310 have domains (the rest are sources).
  • Changed dependency management to poetry for alignment with our other projects
  • Adjusted dbt ingestion workflows to use new source and transformer
    • The cadet ingestion source (dbt) can't create domains, so the creation of domains happens before dbt ingestion

@murdo-moj murdo-moj changed the title stash Create and assign CaDeT domains Jun 5, 2024
@murdo-moj murdo-moj marked this pull request as ready for review June 5, 2024 09:29
Copy link
Contributor

@MatMoore MatMoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't finished reviewing yet, but have left some comments inline

MatMoore
MatMoore previously approved these changes Jun 5, 2024
Copy link
Contributor

@MatMoore MatMoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the approach looks good to me, nice work 👍🏻

There's a lot going on here, so anything we can do to make it easier to maintain and debug would be great. But the new github workflow makes sense, and the tests look good, so think it's good to go once the other comments are resolved.

@murdo-moj murdo-moj merged commit 1cbc0bd into main Jun 6, 2024
2 checks passed
@murdo-moj murdo-moj deleted the fmd-343-cadet-custom-transformer branch June 6, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants