-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement predicate push down for parquet dereference column #15163
Implement predicate push down for parquet dereference column #15163
Conversation
@raunaqmorarka and @phd3 Hi, can you help to take a look? |
ef5a57b
to
2732f5a
Compare
I had waited 10 days before getting code reviewed.. |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/parquet/TestParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
...ino-hive/src/test/java/io/trino/plugin/hive/parquet/predicate/TestParquetPredicateUtils.java
Show resolved
Hide resolved
2732f5a
to
aa29769
Compare
hi, the commit message has been reorganized, please take a further look. |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Show resolved
Hide resolved
@phd3 does this add nested field predicate pushdown for Iceberg (potentially answering #8759 a better way)? cc @alexjo2144 |
@leetcode-1533 PTAL at #15388 |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
@leetcode-1533 can you please rebase this on latest master |
aa29769
to
c6dd75a
Compare
hi, I re-based to And have changed the unit test in #15388 |
c6dd75a
to
3746df6
Compare
3746df6
to
cb71c5d
Compare
hi, it has been rebased with the latest master branch |
For dereferenceSubFieldTypes(), I am actually leaning toward @alexjo2144's suggestion, i.e. respecting useColumnName option also when dereferencing subtypes(though it is not a 'column' name, but rather a subtype's name) Would like know community's opinion on this. |
Current status: After chatting with @raunaqmorarka , I did not implement: “respect “useColumnName” option when dereferencing nested columns”, since we agreed that: "this change is not necessary for the nested columns predicate pushdown, Unless it is a pre-requisite for your change, I would prefer to avoid changing that”. dereferenceSubFieldTypes() is extracted from getColumnType() method currently nothing has changed. My change just built on top of the dereferenceSubFieldTypes(). |
To highlight that the variable is after dereferencing.
b26c5a0
to
58af313
Compare
58af313
to
fd1e9b0
Compare
...rino-testing/src/main/java/io/trino/testing/BaseTestParquetComplexTypePredicatePushDown.java
Outdated
Show resolved
Hide resolved
...rino-testing/src/main/java/io/trino/testing/BaseTestParquetComplexTypePredicatePushDown.java
Outdated
Show resolved
Hide resolved
...rino-testing/src/main/java/io/trino/testing/BaseTestParquetComplexTypePredicatePushDown.java
Outdated
Show resolved
Hide resolved
f669a2a
to
dabc54d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % minor comment
...rino-testing/src/main/java/io/trino/testing/BaseTestParquetComplexTypePredicatePushDown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Show resolved
Hide resolved
dabc54d
to
151118d
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
...-testing/src/main/java/io/trino/testing/BaseTestFileFormatComplexTypesPredicatePushDown.java
Show resolved
Hide resolved
350dc56
to
ba5bffe
Compare
...-testing/src/main/java/io/trino/testing/BaseTestFileFormatComplexTypesPredicatePushDown.java
Outdated
Show resolved
Hide resolved
ba5bffe
to
8e5da5e
Compare
hello, can you help to merge in this.. So I could rebase the iceberg related changes.. |
Description
From https://trino.io/blog/2020/08/14/dereference-pushdown.html: "Another future improvement will be the pushdown of predicates on subfields for data stored in Parquet format. Although the pruning of nested fields occurs with Parquet, the predicates are not yet pushed down into the reader."
This PR enables Parquet page source to use statistics for nested fields.
Additional context and related issues
Related ORC commit: 5069a55
#9928
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: