[OpenLineage] Fix datasets in GCSTimeSpanFileTransformOperator #39064

kacpermuda · 2024-04-16T13:18:14Z

Currently we are including all files as datasets which can lead to increasing the size of the event and make matching datasets between jobs harder.

With that change, we are using prefixes from the user as dataset names and not full file paths. This way, user can easily control the size of the event and also ensure proper matching, when the same two prefixes are passed to different operators. I am also removing the list of files that was saved for the purpose of lineage datasets, introduced in #35838 .

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Signed-off-by: Kacper Muda <[email protected]>

fix: OpenLineage datasets in GCSTimeSpanFileTransformOperator

3c2d8dd

Signed-off-by: Kacper Muda <[email protected]>

boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Apr 16, 2024

mobuchowski approved these changes Apr 17, 2024

View reviewed changes

mobuchowski merged commit 0667083 into apache:main Apr 17, 2024
41 checks passed

kacpermuda deleted the ol-fix-gcs-timespan branch April 18, 2024 06:17

eladkal mentioned this pull request May 1, 2024

Status of testing Providers that were prepared on May 01, 2024 #39346

Closed

eladkal mentioned this pull request May 12, 2024

Status of testing Providers that were prepared on May 12, 2024 #39578

Closed

66 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenLineage] Fix datasets in GCSTimeSpanFileTransformOperator #39064

[OpenLineage] Fix datasets in GCSTimeSpanFileTransformOperator #39064

kacpermuda commented Apr 16, 2024

[OpenLineage] Fix datasets in GCSTimeSpanFileTransformOperator #39064

[OpenLineage] Fix datasets in GCSTimeSpanFileTransformOperator #39064

Conversation

kacpermuda commented Apr 16, 2024