-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retain table statistics during orphan files removal #5795
Conversation
fb784e8
to
658bd27
Compare
6d4dfc2
to
30a48af
Compare
30a48af
to
379b9ff
Compare
ByteBuffer.wrap("blob content".getBytes(StandardCharsets.UTF_8)))); | ||
puffinWriter.finish(); | ||
statisticsFile = | ||
new GenericStatisticsFile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can PuffinWriter
expose toStatisticsFile
instead of relying on the caller to do this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it currently does not have the information for org.apache.iceberg.StatisticsFile#snapshotId
field.
...k/v3.3/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java
Outdated
Show resolved
Hide resolved
...k/v3.3/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java
Show resolved
Hide resolved
*/ | ||
public static List<String> statisticsFilesLocations(Table table) { | ||
List<String> statisticsFilesLocations = Lists.newArrayList(); | ||
TableOperations ops = ((HasTableOperations) table).operations(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this depends on the public API. Let's get that one in first so we don't need to go around the API here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue here I am following existing code in the ReachableFileUtil
class
iceberg/core/src/main/java/org/apache/iceberg/ReachableFileUtil.java
Lines 62 to 64 in c07f2aa
TableOperations ops = ((HasTableOperations) table).operations(); | |
TableMetadata tableMetadata = ops.current(); | |
metadataFileLocations.add(tableMetadata.metadataFileLocation()); |
since this class already casts to HasTableOperations
, I assumed it's OK to do that and therefore removed public API changes from this PR.
If you want first merge Table API changes, the #4741 is ready for your review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine, but it requires the public API to return statisticsFiles from Table so we should get that one in first.
please see my response: #5795 (comment) |
379b9ff
to
8dffa41
Compare
Applied or responded to comments. @rdblue please take another look. |
8dffa41
to
623b11a
Compare
(just rebased after #4741 merged, no changes yet) |
Do not delete table statistics files when running remove_orphan_files.
623b11a
to
2c09948
Compare
Done now. @rdblue please take another look |
Thanks, @findepi! I merged this. |
thank you @rdblue for the merge! |
Do not delete table statistics files when running remove_orphan_files.
Extracted from #4741 and based on that PR, and also on #5794 and #5799.