[OpenLineage] Fix datasets in GCSDeleteObjectsOperator #39059
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently in OpenLineage method we are including all deleted files as datasets which can lead to increasing the size of the event and make matching datasets between jobs harder.
With that change, when using prefixes, we are using them as dataset names and not full file paths. This way, user can easily control the size of the event and also ensure proper matching, when the same two prefixes are passed to different operators. I am also removing the list of files that was saved for the purpose of lineage datasets, introduced in #35838
When reviewing, please take a look at test cases to see how the code will behave now.
Also, I am adjusting prefix typing (hook.list allows list of prefixes) and error raising that was missing in my opinion.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.