-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Dereference pushdown for the Delta Lake connector #17085
Conversation
ef81ebc
to
b0b04f0
Compare
import static java.lang.String.format; | ||
import static org.assertj.core.api.Assertions.assertThat; | ||
|
||
public class TestDeltaLakeProjectionPushdownPlans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Possible to make it common?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a product test reading from Delta Lake tables (created either by Databricks/Delta OSS) with column mapping mode |
.../trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeColumnProjectionInfo.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeConfig.java
Show resolved
Hide resolved
...n/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/MetadataEntry.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
b0b04f0
to
9fe1849
Compare
Added some more tests. |
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
...rino-delta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeMinioConnectorTest.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/TestingConnectorBehavior.java
Outdated
Show resolved
Hide resolved
b1c173b
to
3e3167b
Compare
3e3167b
to
46dc82e
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
@Test | ||
public void testHighlyNestedData() | ||
{ | ||
// TODO consider moving this in BaseConnectorTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could put these tests in a base class in io.trino.testing and then run it for the relevant connectors and file formats by adding derived classes (e.g. BaseOrcWithBloomFiltersTest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am considering moving tests in follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great finding Raunaq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...rino-delta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeMinioConnectorTest.java
Outdated
Show resolved
Hide resolved
41659e2
to
7e7acaa
Compare
4eee057
to
dd60e5c
Compare
rebased with master and resolved conflicts |
.../trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeColumnProjectionInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeColumnHandle.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
dd60e5c
to
b6be925
Compare
Addressed rest of the comments: will raise follow-up PR for below comments. |
@@ -102,7 +103,7 @@ public TableStatistics getTableStatistics(ConnectorSession session, DeltaLakeTab | |||
} | |||
|
|||
Set<String> predicatedColumnNames = tableHandle.getNonPartitionConstraint().getDomains().orElseThrow().keySet().stream() | |||
.map(DeltaLakeColumnHandle::getName) | |||
.map(DeltaLakeColumnHandle::getBaseColumnName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use projected column name here if it dereference field?
I see this as part of #17164
return value.flatMap(o -> deserializeStatisticsValue(columnHandle, String.valueOf(o))); | ||
} | ||
|
||
private Optional<Object> deserializeStatisticsValue(DeltaLakeColumnHandle columnHandle, String statValue) | ||
{ | ||
if (!columnHandle.isBaseColumn()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: FYI This seems to be done as well in deserializeColumnValue
one line below.
No change needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deserializeColumnValue
throw a verification exception to verify the column is base column.
verify(column.isBaseColumn(), "Unexpected dereference: %s", column);
@@ -55,20 +55,20 @@ public void testColumnMappingModeNone() | |||
|
|||
onDelta().executeQuery("" + | |||
"CREATE TABLE default." + tableName + | |||
" (a_number INT)" + | |||
" (a_number INT, nested STRUCT<field1: STRING, field2: STRING>)" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have tests for tables that underwent insert/update/delete/merge with onDelta + dereference pushdown (for all column mapping modes)
@findepi what is the rationale for this request in the context of this read oriented feature ? Is this more about being future-proof?
@findepi can you pls run this PR with secrets? |
/test-with-secrets sha=b6be925a0b09044923a17b37133fd58809e00540 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please squash (after the test with secrets run)
return getColumns(table.getMetadataEntry()).stream() | ||
return table.getProjectedColumns() | ||
.map(projectedColumns -> (List<DeltaLakeColumnHandle>) projectedColumns.stream() | ||
.map(DeltaLakeColumnHandle.class::cast) // TODO DeltaLakeTableHandle.projectedColumns should be a collection of DeltaLakeColumnHandle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to remember to solve this in #17365
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeSessionProperties.java
Outdated
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
test-with-secrets run is 🟢 https://github.com/trinodb/trino/actions/runs/4912972173 |
b6be925
to
1b9b988
Compare
Addressed comments |
@krvikash thank you so much! |
Thank you so much, @findepi | @findinpath | @raunaqmorarka | @alexjo2144 for the reviews and constant support. |
Description
This PR implements dereference pushdown for Delta Lake connector(similar to #8129).
This adds significant performance improvements for queries accessing nested fields inside struct/row columns. They have been optimized through the pushdown of dereference expressions. With this feature, the query execution prunes structural data eagerly, extracting the necessary fields.
For Example:
I have a table having a nested field
col
. When perform selectingcol.a
, we can see the difference inInput
andPhysical Input
values in the query plan when running with and without dereference pushdown.Table Schema as below:
Query Plan without Dereference pushdown:
Query Plan with Dereference pushdown:
More Details about dereference pushdown: https://trino.io/blog/2020/08/14/dereference-pushdown.html
Additional context and related issues
The feature is enabled by default.
The feature can be disabled by setting
delta.projection-pushdown-enabled
configuration property ordelta.projection_pushdown_enabled
session property tofalse
.Release notes
(X) Release notes are required, with the following suggested text: