-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize join when build side is unique #13747
Optimize join when build side is unique #13747
Conversation
How is it achieved? |
Are you asking because you're curious or I should rephrase the description? |
both
Fortunately the primary objective of a PR description is to be informative to technical people. |
The part that you asked about is exactly about "non-technical end user". I will add some technical description once batching lands and this PR can actually be merged |
7726c1e
to
d7b9397
Compare
d7b9397
to
044347c
Compare
@skrzypo987 do you have benchmark numbers ? |
@skrzypo987 could you also rebase? |
If the positions in build side are not unique, i.e. a single position on the probe side matches more than one position on the build side, the `positionLinks` data structure holds the positions to match. If the build side is unique the `positionLinks` object is null and there is a single null-check every probe position. However, since lookup source is most likely partitioned, the null-check needs to go into a proper partition and the partition check and delegation is carried out. This is a relatively expensive operation, so this commit skips it altogether if the build side is unique.
044347c
to
6cb873b
Compare
I ran benchmarks after the rebase and the results are somewhere between no change and a slight regression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -257,6 +259,12 @@ public void appendTo(long position, PageBuilder pageBuilder, int outputChannelOf | |||
pagesHashStrategy.appendTo(blockIndex, blockPosition, pageBuilder, outputChannelOffset); | |||
} | |||
|
|||
@Override | |||
public boolean isMappingUnique() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is part of this PR but would be good to have this available in OperatorStats
IMO it would beneficial to merge this if we have JMH benchmarks that show improvement here, even if tpch/tpcds do not confirm that as this can be a good step that opens up further optimizations. |
I don't think I agree with you. Those changes may be neutral at that point and there is no reason to increase complexity, unless we actually see any reasonable benchmark results. I will run benchmarks again after merging #14493. Maybe it will help One way or another this PR may be a base for some more join optimisations, given someone is going to take over. |
@skrzypo987 @lukasz-stec is this still in progress or should we close this PR? |
@mosabua this is potentially valuable but no one is working on this AFAIK. If this means we close, lets close. |
@sopel39 and @raunaqmorarka can maybe decide with @skrzypo987 .. I cant assess how valuable. |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Rebased on top of #13352.
Only last two commits are relevant
Description
improvement
core query engine
Increase performance of join when build side is unique
Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: