-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose table statistics in Table API #4741
Conversation
e258b4f
to
e35cf57
Compare
@findepi, I think this is too big to go in on its own. Can you split it into separate PRs for different parts? |
core/src/main/java/org/apache/iceberg/GenericStatisticsFile.java
Outdated
Show resolved
Hide resolved
2a9585d
to
290152b
Compare
We need to remember to keep |
Thanks for reminding about this here. It's on my todo list, see eg trinodb/trino#12317 (comment) |
24b1a32
to
3e94f20
Compare
Fixed & test added. |
486c731
to
ac219c0
Compare
8a5b261
to
765de2e
Compare
Updated to the current state of #5021 |
765de2e
to
9179106
Compare
Rebased after #5021 has been merged to make Conflicts disappear. |
@rdblue please let me know if you have any comments. |
* | ||
* @return the current statistics files for the table | ||
*/ | ||
List<StatisticsFile> statisticsFiles(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to expose these directly? What about instead adding an API for finding statistics? That way consumers don't need to decide which one to use themselves.
Either way, this is for consuming stats and we should add the implementation of the update stats API in its own PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to expose these directly?
Yes, these should be exposed, just like we expose other information about table (files, their min/maxes, etc.) and we provide convenience functionality on top of them (like planFiles
).
What about instead adding an API for finding statistics?
Good idea. What would you like to see here?
public class GenericBlobMetadata implements BlobMetadata { | ||
public class GenericBlobMetadata implements BlobMetadata, Serializable { | ||
|
||
public static BlobMetadata from(org.apache.iceberg.puffin.BlobMetadata puffinMetadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we make it so that the Puffin BlobMetadata
implements BlobMetadata
? Then this would be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Puffin BlobMetadata contains all the blob metadata information from the Puffin footer.
In particular, it contains more fields, like offset and length, which are not carried over to the table metadata.
We can make Puffin BlobMetadata
implement the BlobMetadata
, but
- we don't want to serialize (as in
Serializable
) the Puffin BlobMetadata, as we're not interested in sending over those additional fields (I know these are small today, and maybe are never big in the future) - there is a challenge with equality semantics.
GenericBlobMetadata
implements equals, but with more complicated class hierarchy it wouldn't be useful anymore.
Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to serialize (as in Serializable) the Puffin BlobMetadata
i see #4741 (comment) now
core/src/main/java/org/apache/iceberg/GenericStatisticsFile.java
Outdated
Show resolved
Hide resolved
b8da141
to
0ce95d9
Compare
Per #4741 (comment), I've shrunk the scope of this PR |
Thanks, @findepi! I'll give this another look. |
0ce95d9
to
2444114
Compare
This adds support in `Table` for the table statistics.
2444114
to
11fab99
Compare
@rdblue this is ready to review. |
Thanks, @findepi! |
Thank you for the merge! |
Follows #4945 and #5021