Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary DeterminePartitionCount work #17150

Merged

Conversation

pettyjamesm
Copy link
Member

Description

Modifies the DeterminePartitionCount optimizer rule to only attempt stats collection and dynamic partition count selection when the plan contains an eligible remote exchange. Otherwise, plan stats collection would be triggered potentially unnecessarily.

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Apr 20, 2023
@pettyjamesm pettyjamesm requested a review from arhimondr April 20, 2023 17:33
@pettyjamesm pettyjamesm force-pushed the limit-determine-partition-count branch from fa552f9 to 178e1fd Compare April 20, 2023 20:27
@pettyjamesm pettyjamesm requested a review from gaurav8297 April 20, 2023 20:38
@pettyjamesm pettyjamesm force-pushed the limit-determine-partition-count branch from 178e1fd to 56bdc11 Compare April 21, 2023 16:54
Copy link
Member

@gaurav8297 gaurav8297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @pettyjamesm

@gaurav8297
Copy link
Member

There are some test failures. Can you look into that?

@pettyjamesm pettyjamesm force-pushed the limit-determine-partition-count branch from 56bdc11 to 75fa8b4 Compare April 24, 2023 11:55
@pettyjamesm
Copy link
Member Author

There are some test failures. Can you look into that?

Yep, test failures are related because some tests are asserting the number of metastore accesses which is now lower than before for some queries. I've updated the assertions that I could verify locally with the new numbers, but may still be missing some assertions so we'll have to wait for the next CI run results to see if there are more updates needed.

@github-actions github-actions bot added delta-lake Delta Lake connector hive Hive connector tests:hive labels Apr 24, 2023
@pettyjamesm pettyjamesm force-pushed the limit-determine-partition-count branch from 75fa8b4 to 9ade84c Compare April 24, 2023 23:30
@gaurav8297 gaurav8297 requested a review from sopel39 April 25, 2023 14:26
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

Modifies the DeterminePartitionCount optimizer rule to only attempt
stats collection and dynamic partition count selection when the plan
contains an eligible remote exchange. Otherwise, plan stats collection
would be triggered potentially unnecessarily.
@pettyjamesm pettyjamesm force-pushed the limit-determine-partition-count branch from 9ade84c to 133067d Compare April 25, 2023 19:49
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks!

return false;
}
PartitioningHandle partitioningHandle = exchangeNode.getPartitioningScheme().getPartitioning().getHandle();
return !partitioningHandle.isScaleWriters()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is probably enforced by PlanNodeSearcher.searchFrom(plan).whereIsInstanceOfAny(INSERT_NODES).matches(); check.

@gaurav8297 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was until #17024 enabled automatic partition determination for write queries behind a session property (default: false).

assertDistributedPlan(
query,
Session.builder(getQueryRunner().getDefaultSession())
.setSystemProperty(MAX_HASH_PARTITION_COUNT, "20")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you could move it as default session setup for these tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to explicitly configure the methods separately since many of the tests still use different values to assert different count values selected from the parameters used.

@pettyjamesm pettyjamesm merged commit fb8cf33 into trinodb:master Apr 26, 2023
@pettyjamesm pettyjamesm deleted the limit-determine-partition-count branch April 26, 2023 15:20
@github-actions github-actions bot added this to the 415 milestone Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector
Development

Successfully merging this pull request may close these issues.

3 participants