Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dereference pushdown for MongoDB connector #17710

Merged

Conversation

krvikash
Copy link
Contributor

@krvikash krvikash commented May 31, 2023

Description

This PR implements dereference pushdown for Mongodb connector(similar to #17085).

This adds significant performance improvements for queries accessing nested fields inside struct/row columns. They have been optimized through the pushdown of dereference expressions. With this feature, the query execution prunes structural data eagerly, extracting the necessary fields.

More Details about dereference pushdown: https://trino.io/blog/2020/08/14/dereference-pushdown.html

Note: This PR merges the work of #14467 and #16790

Additional context and related issues

The feature is enabled by default.

The feature can be disabled by setting mongodb.projection-pushdown-enabled configuration property or mongodb.projection_pushdown_enabled session property to false.

Release notes

(X) Release notes are required, with the following suggested text:

# Mongodb
* Improve read performance for tables with row (struct) columns when only subset of fields is needed by a query. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 31, 2023
@github-actions github-actions bot added delta-lake Delta Lake connector docs hive Hive connector iceberg Iceberg connector mongodb MongoDB connector tests:hive labels May 31, 2023
@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch from b38b744 to ad5cd4b Compare May 31, 2023 11:19
@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch 2 times, most recently from 305397a to 83d9bd8 Compare June 1, 2023 13:45
@krvikash krvikash marked this pull request as ready for review June 1, 2023 13:48
@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch 2 times, most recently from 6cc226e to 851554e Compare June 1, 2023 16:31
@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch from 851554e to 8baaf45 Compare June 5, 2023 05:56
@krvikash
Copy link
Contributor Author

krvikash commented Jun 5, 2023

rebased to resolve conflicts

@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch from 8baaf45 to 777382a Compare June 5, 2023 06:13
@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch from 777382a to 03877e3 Compare June 6, 2023 08:49
@krvikash
Copy link
Contributor Author

krvikash commented Jun 6, 2023

Addressed comments and added TestMongoComplexTypePredicatePushDown for predicate pushdown.

@ebyhr ebyhr removed their request for review June 6, 2023 08:51
@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch from 03877e3 to 7f8da8b Compare June 6, 2023 09:02
Comment on lines +499 to +515
* Creates a set of sufficient columns for the input projected columns. For example,
* if input {@param columns} include columns "a.b" and "a.b.c", then they will be projected from a single column "a.b".
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the engine performs this optimization i.e when we were about to project columns like a.b and a.b.c there should be a project operator which ensures the plan like

tableScan(with column a.b) -> project (with two column a.b and a.b.c) ?

cc: @kasiafi , @findepi , @hashhar , @martint

if (mongoColumnHandle.isBaseColumn()) {
return value;
}
if (value instanceof DBRef dbRefValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for the engine to send only DBRef to the underlying MongoDB and it could apply the projection pushdown at its end ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the projection pushdown, MongoPageSource will have all dereferenced columns.

I could not find a way to send only DBRef here. Let me know if you have some suggestions.

@@ -0,0 +1,295 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the plan assertion as a part of ConnectorTest/ConnectorSmokeTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to what TestDeltaLakeProjectionPushdownPlans and other connectors' Projection Pushdown Plan tests.

table -> {
MongoTableHandle mongoTableHandle = (MongoTableHandle) table;
TupleDomain<ColumnHandle> constraint = mongoTableHandle.getConstraint();
return mongoTableHandle.getProjectedColumns().equals(ImmutableSet.of(column0Handle, columnY))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to do an exact equals here - We need to ensure that - we project till the parent column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch 3 times, most recently from e6e59ac to 8866ee9 Compare June 26, 2023 10:59
@Praveen2112
Copy link
Member

Can we squash the fixup commits

@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch from 8866ee9 to 4d3f73f Compare June 28, 2023 07:44
@krvikash
Copy link
Contributor Author

Can we squash the fixup commits

Done

@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch from 4d3f73f to 7f7cac3 Compare June 28, 2023 08:08
Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% comments

.setCatalogSessionProperty(CATALOG, "projection_pushdown_enabled", "false")
.build();

getQueryRunner().execute("CREATE TABLE " + tableName + " (col0) AS SELECT CAST(row(5, 6) AS row(a bigint, b bigint)) AS col0 WHERE false");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace them with TestTable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test class uses LocalQueryRunner which does not allow DROP TABLE. TestTable execute 'DROP TABLE' after executing on close. So can not use TestTable here.


private static boolean isPushDownSupported(ConnectorExpression connectorExpression)
{
if (!(connectorExpression instanceof FieldDereference fieldDereference)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky - now it means we support non FieldDereference pushdown expression too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated method to return false for connectorExpression other than Variable and FieldDereference.

@krvikash
Copy link
Contributor Author

Thanks, @Praveen2112 for reviewing. I have addressed the comments.

Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits.

Co-authored-by: praveenkrishna <[email protected]>
Co-authored-by: Mateusz "Serafin" Gajewski <[email protected]>
@krvikash krvikash force-pushed the krvikash/mongodb-dereference-pushdown branch from 06b3390 to 8a77330 Compare June 30, 2023 07:46
@krvikash
Copy link
Contributor Author

Thanks, @Praveen2112 for reviewing. Addressed comments.

@krvikash
Copy link
Contributor Author

@Praveen2112 Praveen2112 merged commit dd4bcb0 into trinodb:master Jul 4, 2023
@github-actions github-actions bot added this to the 421 milestone Jul 4, 2023
@krvikash
Copy link
Contributor Author

krvikash commented Jul 4, 2023

Thank you all for reviewing the PR and merging it.

@krvikash krvikash deleted the krvikash/mongodb-dereference-pushdown branch July 4, 2023 06:17
@Praveen2112
Copy link
Member

Thanks for working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector docs hive Hive connector iceberg Iceberg connector mongodb MongoDB connector
Development

Successfully merging this pull request may close these issues.

6 participants