-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add projection push down for STRUCT field in big-query connector #23443
Add projection push down for STRUCT field in big-query connector #23443
Conversation
Please add a simple Take the description from the PR #17085 as reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/TestBigQueryMetadata.java
Outdated
Show resolved
Hide resolved
430945f
to
7e51bce
Compare
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryMetadata.java
Outdated
Show resolved
Hide resolved
columns.stream() | ||
.map(BigQueryColumnHandle.class::cast) | ||
.forEach(column -> checkArgument(projectedColumnNames.contains(column.name()), "projected columns should contain all reader columns")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially create Set for the parent column names in case of column handles provided by the split and passed as a part of PageSource -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set<String> projectedColumnNames = bigQuerySplit.getColumns().stream().map(BigQueryColumnHandle::name).collect(Collectors.toSet());
checkArgument(bigQuerySplit.getColumns().isEmpty() || bigQuerySplit.getColumns().map(BigQueryColumnHandle::name).collect(Collectors.toSet()).equals(columns),
"Requested columns %s do not match list in split %s", columns, bigQuerySplit.getColumns());
7e51bce
to
70b8046
Compare
for (int index : indices) { | ||
checkArgument(type instanceof RowType, "type should be Row type"); | ||
RowType rowType = (RowType) type; | ||
RowType.Field field = rowType.getFields().get(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can field
be null here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hope no:
public static ProjectedColumnRepresentation createProjectedColumnRepresentation(ConnectorExpression expression)
{
ImmutableList.Builder<Integer> ordinals = ImmutableList.builder();
Variable target;
while (true) {
if (expression instanceof Variable variable) {
target = variable;
break;
}
if (expression instanceof FieldDereference dereference) {
ordinals.add(dereference.getField());
expression = dereference.getTarget();
}
else {
throw new IllegalArgumentException("expression is not a valid dereference chain");
}
}
return new ProjectedColumnRepresentation(target, ordinals.build().reverse());
}
/test-with-secrets sha=70b8046a133f404c46259f00fa37f6caba232de8 |
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/10931512904 |
Do we need any doc update here @ebyhr @vlad-lyutenko |
Description
This PR implements dereference projection pushdown for BigQuery connector(similar to #17085).
This adds significant performance improvements for queries accessing nested fields inside struct/row columns. They have been optimized through the pushdown of dereference expressions. With this feature, the query execution prunes structural data eagerly, extracting the necessary fields.
For Example:
I have a table having a nested field
root
. When perform selectingroot.f1
, we can see the difference inInput
andPhysical Input
values in the query plan when running with and without dereference pushdown.Table Schema as below:
Query Plan without Dereference pushdown:
Query Plan with Dereference pushdown:
Additional context and related issues
The feature is enabled by default.
The feature can be disabled by setting bigquery.projection-pushdown-enabled configuration property or bigquery.projection_pushdown_enabled session property to false.
Release notes
(X) Release notes are required, with the following suggested text: