Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize join when build side is unique #13747

Conversation

skrzypo987
Copy link
Member

Rebased on top of #13352.
Only last two commits are relevant

Description

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine

How would you describe this change to a non-technical end user or system administrator?

Increase performance of join when build side is unique

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Increase performance of join when build side is unique

@findepi
Copy link
Member

findepi commented Aug 23, 2022

How would you describe this change to a non-technical end user or system administrator?

Increase performance of join when build side is unique

How is it achieved?

@skrzypo987
Copy link
Member Author

How would you describe this change to a non-technical end user or system administrator?

Increase performance of join when build side is unique

How is it achieved?

Are you asking because you're curious or I should rephrase the description?
It is rather technical and includes words like hard-coding which non-technical people do not like.

@findepi
Copy link
Member

findepi commented Aug 24, 2022

Are you asking because you're curious or I should rephrase the description?

both

It is rather technical and includes words like hard-coding which non-technical people do not like.

Fortunately the primary objective of a PR description is to be informative to technical people.

@skrzypo987
Copy link
Member Author

The part that you asked about is exactly about "non-technical end user".

I will add some technical description once batching lands and this PR can actually be merged

@skrzypo987 skrzypo987 force-pushed the skrzypo/107-join-unique-positions branch 5 times, most recently from 7726c1e to d7b9397 Compare August 26, 2022 10:04
@skrzypo987 skrzypo987 force-pushed the skrzypo/107-join-unique-positions branch from d7b9397 to 044347c Compare September 7, 2022 06:27
@sopel39
Copy link
Member

sopel39 commented Oct 4, 2022

@skrzypo987 do you have benchmark numbers ?

@sopel39
Copy link
Member

sopel39 commented Oct 4, 2022

@skrzypo987 could you also rebase?

If the positions in build side are not unique, i.e. a single position on the
probe side matches more than one position on the build side, the `positionLinks`
data structure holds the positions to match. If the build side is unique the
`positionLinks` object is null and there is a single null-check every probe
position. However, since lookup source is most likely partitioned, the null-check
needs to go into a proper partition and the partition check and delegation is
carried out. This is a relatively expensive operation, so this commit skips
it altogether if the build side is unique.
@skrzypo987 skrzypo987 force-pushed the skrzypo/107-join-unique-positions branch from 044347c to 6cb873b Compare October 5, 2022 04:25
@skrzypo987
Copy link
Member Author

I ran benchmarks after the rebase and the results are somewhere between no change and a slight regression.
So in this shape this PR does not improve anything. I will post/rebase/benchmark the RLE changes later today and we'll see if it helps.

Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -257,6 +259,12 @@ public void appendTo(long position, PageBuilder pageBuilder, int outputChannelOf
pagesHashStrategy.appendTo(blockIndex, blockPosition, pageBuilder, outputChannelOffset);
}

@Override
public boolean isMappingUnique()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is part of this PR but would be good to have this available in OperatorStats

@lukasz-stec
Copy link
Member

I ran benchmarks after the rebase and the results are somewhere between no change and a slight regression. So in this shape this PR does not improve anything. I will post/rebase/benchmark the RLE changes later today and we'll see if it helps.

IMO it would beneficial to merge this if we have JMH benchmarks that show improvement here, even if tpch/tpcds do not confirm that as this can be a good step that opens up further optimizations.

@skrzypo987
Copy link
Member Author

I ran benchmarks after the rebase and the results are somewhere between no change and a slight regression. So in this shape this PR does not improve anything. I will post/rebase/benchmark the RLE changes later today and we'll see if it helps.

IMO it would beneficial to merge this if we have JMH benchmarks that show improvement here, even if tpch/tpcds do not confirm that as this can be a good step that opens up further optimizations.

I don't think I agree with you. Those changes may be neutral at that point and there is no reason to increase complexity, unless we actually see any reasonable benchmark results.

I will run benchmarks again after merging #14493. Maybe it will help

One way or another this PR may be a base for some more join optimisations, given someone is going to take over.

@mosabua
Copy link
Member

mosabua commented Jan 12, 2024

@skrzypo987 @lukasz-stec is this still in progress or should we close this PR?

@lukasz-stec
Copy link
Member

@mosabua this is potentially valuable but no one is working on this AFAIK. If this means we close, lets close.

@mosabua
Copy link
Member

mosabua commented Jan 16, 2024

@sopel39 and @raunaqmorarka can maybe decide with @skrzypo987 .. I cant assess how valuable.

Copy link

github-actions bot commented Sep 4, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Sep 4, 2024
@mosabua mosabua closed this Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants