-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement predicate push down for parquet dereference column in Iceberg #17133
Implement predicate push down for parquet dereference column in Iceberg #17133
Conversation
d296d47
to
b9ae609
Compare
b9ae609
to
bf77929
Compare
Fixing product test: "2023-04-28T05:52:04.9369644Z tests | 2023-04-28 11:37:04 INFO: FAILURE / io.trino.tests.product.iceberg.TestIcebergSparkCompatibility.testIdBasedFieldMapping [PARQUET, 2] (Groups: iceberg_jdbc, profile_specific_tests, iceberg, iceberg_rest) took 3.7 seconds |
bf77929
to
2323605
Compare
I fixed the bug and submitted the PR for another check |
1194880
to
234a076
Compare
...rg/src/test/java/io/trino/plugin/iceberg/TestIcebergParquetComplexTypePredicatePushDown.java
Outdated
Show resolved
Hide resolved
// 21 is a value between [2, 20000] but is an odd number, so won't be discarded by Iceberg table's statistics. | ||
// At the meantime, 21 is not within the bound of any row group. So can be discarded by Parquet's row group statistics. | ||
assertNoDataRead("SELECT * FROM " + tableName + " WHERE col1Row.a = 21"); | ||
assertNoDataRead("SELECT * FROM " + tableName + " WHERE col1Row.a IS NULL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertNoDataRead
is currently a bit misleading because it builds on processedInputDataSize
and not physicalInputDataSize
However this change relieves the engine of dealing with additional computations when the data leaves the parquet reader. Very good catch @leetcode-1533
No change requested in this PR
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
👋 @leetcode-1533 @findinpath @findepi - this PR has become inactive. If you're still interested in working on it, please let us know. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
234a076
to
77e5049
Compare
...g/src/test/java/io/trino/plugin/iceberg/TestIcebergParquetComplexTypesPredicatePushDown.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestParquetPredicates.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestParquetPredicates.java
Outdated
Show resolved
Hide resolved
...g/src/test/java/io/trino/plugin/iceberg/TestIcebergParquetComplexTypesPredicatePushDown.java
Outdated
Show resolved
Hide resolved
// predicate domain | ||
IcebergColumnHandle projectedColumn = new IcebergColumnHandle( | ||
new ColumnIdentity( | ||
5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 5 ?
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestParquetPredicates.java
Outdated
Show resolved
Hide resolved
77e5049
to
38c58a3
Compare
Thank you @leetcode-1533 for this contribution. |
Thanks @raunaqmorarka and @findinpath for finishing it up |
Description
From https://trino.io/blog/2020/08/14/dereference-pushdown.html: "Another future improvement will be the pushdown of predicates on subfields for data stored in Parquet format. Although the pruning of nested fields occurs with Parquet, the predicates are not yet pushed down into the reader."
This PR enables Parquet page source to use statistics for nested fields in the iceberg connector.
Additional context and related issues
Related ORC commit: 5069a55
Fixes #9928
Hive change PR: #15163
Release notes
() This is not user-visible or docs only and no release notes are required.
() Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: