-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(ingest): Minor cleanup of File, CsvEnricher, BusinessGlossary, and FileLineage sources #7718
refactor(ingest): Minor cleanup of File, CsvEnricher, BusinessGlossary, and FileLineage sources #7718
Conversation
…y, and FileLineage sources
@@ -102,6 +103,9 @@ class CSVEnricherSource(Source): | |||
be applied at the entity field. If a subresource IS populated (as it is for the second and third rows), glossary | |||
terms and tags will be applied on the subresource. Every row MUST have a resource. Also note that owners can only | |||
be applied at the resource level and will be ignored if populated for a row with a subresource. | |||
|
|||
:::note | |||
This source will not work on very large csv files that do not fit in memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you need a trailing :::
as well
T = TypeVar("T", bound=WorkUnit) | ||
|
||
|
||
def auto_workunit_reporter(report: SourceReport, stream: Iterable[T]) -> Iterable[T]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to get to a point where sources can also generate MCPs and MCEs directly, and the helper wraps them into WorkUnits as necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…y, and FileLineage sources (#7718) - Adds auto_workunit_reporter to each source - Standardizes comments around remote paths - Adds back AuditStamp to FileLineage source - Some generic refactoring
Fixes a regression from datahub-project#7718.
A few followups to #7552, + some modernization of the sources, with
auto_workunit_reporter
.Refactoring got the most complicated on the file lineage source but logic should still be the same, sorry if it's a difficult review.
Checklist