-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create and assign CaDeT domains #138
Conversation
murdo-moj
commented
Jun 4, 2024
•
edited
Loading
edited
- Added a new source which creates domains from the CaDeT manifest
- Added a transformer which builds upon the dbt ingestion and assigns those domains to CaDet models
- The transformer has been adapted from the existing PatternAddDatasetDomain transformer written by DataHub.
- A urn is formed for every dataset from the manifest which maps to a domain.
- Domains in CaDeT are not assigned to sources, only models, so out of 505 datasets currently ingested, only ~310 have domains (the rest are sources).
- Changed dependency management to poetry for alignment with our other projects
- Adjusted dbt ingestion workflows to use new source and transformer
- The cadet ingestion source (dbt) can't create domains, so the creation of domains happens before dbt ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't finished reviewing yet, but have left some comments inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the approach looks good to me, nice work 👍🏻
There's a lot going on here, so anything we can do to make it easier to maintain and debug would be great. But the new github workflow makes sense, and the tests look good, so think it's good to go once the other comments are resolved.