-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-1707] Enhance IcebergDataset
to detect when files already at dest then proceed with only delta
#3575
Merged
ZihanLi58
merged 18 commits into
apache:master
from
phet:iceberg_distcp__scan_icebergs_delta_calc2
Oct 14, 2022
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
55c1d2c
Merge remote-tracking branch 'upstream/master'
phet 6930a69
Merge remote-tracking branch 'upstream/master'
phet 8219430
Merge remote-tracking branch 'upstream/master'
phet 6c27a41
Merge remote-tracking branch 'upstream/master'
phet a78d414
Enhance `IcebergDataset` to detect when files already at dest and pro…
phet 499f47f
fixup: minor doc/comment mods
phet 0071181
Allow not finding a file mentioned by iceberg metadata to be a non-fa…
phet 63146e9
Have `IcebergDataset` check more thoroughly at dest for previously-co…
phet fd028cf
Add short-circuiting of `IcebergDataset` file scan when (root) metada…
phet a91f681
Abreviate `IcebergDataset` logging when source file not found, refact…
phet 89ca3c8
Improve javadoc as suggested during review
phet 42444cf
Streamline `IcebergDataset` logging when source path not found
phet 1b3cbb0
Skip `IcebergDataset` per-data-file check, falling back to per-manife…
phet 964a553
minor comment change
phet 05dddcf
Add `IcebergDataset` logging to indicate volume of filepaths accumulated
phet 4650b7f
Extend and refactor `IcebergDataset` logging to indicate volume of fi…
phet 5cbc405
improve comments
phet 72b5662
Add logging for running count of source paths not found
phet File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add comment here to mention that only oldest snapshot which most likely already been copied contains the all old files, the mfi from newer snapshot only contain new added file in those manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those are definitely the semantics of using
IcebergTable.getIncrementalSnapshotInfosIterator()
. it's in the javadoc there. where do you want me to repeat it over here? maybe should I note those semantics above here when I call that method?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can add more clarity in IcebergTable.getIncrementalSnapshotInfosIterator(), at least I was missing the assumption that it return the snapshot from oldest to latest. and files only appear once in earliest snapshot that contains it. Might be just me though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing. I added this: