Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: make lookup-join deduplication efficient for range lookups #85950

Open
DrewKimball opened this issue Aug 11, 2022 · 1 comment
Open

sql: make lookup-join deduplication efficient for range lookups #85950

DrewKimball opened this issue Aug 11, 2022 · 1 comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@DrewKimball
Copy link
Collaborator

DrewKimball commented Aug 11, 2022

Is your feature request related to a problem? Please describe.
Currently, de-duplication of the spans used for lookup is performed in the lookup joiner logic using a go map keyed on the span start and end keys. This works well for lookup joins that only use equalities (e.g. point-lookups), since in this case spans that overlap are exactly equivalent and can therefore be deduplicated by the map. For range lookups, however, spans may only partially overlap - for example, the start keys of two spans may be the same but the end keys different. Currently, we would not perform any de-duplication in this case and would instead perform two separate lookups, which may involve a lot of redundant work.

Describe the solution you'd like
First of all, de-duplication should be moved from the joinReader to the Streamer - see #82155.

For point lookups, it makes sense to use a map as before, since this is simple and imposes little overhead. However, for range lookups, a sort-merge strategy probably makes the most sense in order to eliminate redundant work. This work would only be performed once per input batch in order to de-duplicate. For matching and emitting looked-up rows to input rows, we would still use a simple multi-map slice like spanIDHelper.spanIDToInputRowIndices as before.

Additional context
#66002 and #85597 added support for range-based lookup joins, but lookup joins that don't use equality conditions (only inequalities) are still only planned in limited cases in order to avoid performance regressions due to the lack of de-duplication. Fixing this issue will allow us to plan lookup joins in more cases.

Jira issue: CRDB-18498

@DrewKimball DrewKimball added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Aug 11, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Aug 11, 2022
@mgartner mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023
Copy link

github-actions bot commented Feb 5, 2024

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@DrewKimball DrewKimball moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

1 participant