-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Showcase dereference pushdown limited effectiveness in Iceberg #17145
Showcase dereference pushdown limited effectiveness in Iceberg #17145
Conversation
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
b27672a
to
016940c
Compare
Dereference pushdown is effective on filtering the data being read from the source so that only relevant nested data is being provided by the connector to the engine. However, the whole content corresponding to a nested column is currently read from the data files.
016940c
to
23a3f2b
Compare
DataSize processedDataSizeWithPushdown = statsWithPushdown.getProcessedInputDataSize(); | ||
assertQueryStats( | ||
sessionWithoutPushdown, | ||
selectQuery, | ||
statsWithoutPushdown -> assertThat(statsWithoutPushdown.getProcessedInputDataSize()).isGreaterThan(processedDataSizeWithPushdown), | ||
statsWithoutPushdown -> { | ||
//TODO (https://github.com/trinodb/trino/issues/17156) add dereference pushdown on the physical layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need more test data to show data pruning happens on read and/or disable Parquet compression (on write).
3000 'Z' characters will compress very well, and the coalescing adjacent reads should kick in
@findepi @findinpath I wanted to understand the difference between Physical input size and Input size. As per my initial basic understanding was that Physical size is compressed and will be always smaller in size and input size is uncompressed data that will be acted upon by Trino and will be always higher in size but after looking at multiple explain plan I don't think its correct understanding. I have documented some of my explanation in this discussion https://trinodb.slack.com/archives/CJ6UC075E/p1685710163200809 |
Description
As can be seen in the snippets below:
Physical input: 1.20kB
Input: 1 row (5B)
which is much less than the query doing the selection of the whole rowInput: 1 row (50B)
This PR proves that the dereference pushdown in Iceberg has limited effectiveness.
The logic for building a stripped file schema
MessageType
(seetrino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Lines 316 to 358 in a524709
IcebergPageSourceFactory
trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Lines 905 to 918 in a524709
For this reason, the
ParquetReader
reads always all the primitive fields of the nested columns selected in the Iceberg connector.tldr; In Iceberg, the Parquet file schema is built from base columns, while on Hive is built out of adaptation of the original base columns which contain Optional.empty() for what is not being read.
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: