sql: make lookup-join deduplication efficient for range lookups #85950

DrewKimball · 2022-08-11T09:48:14Z

Is your feature request related to a problem? Please describe.
Currently, de-duplication of the spans used for lookup is performed in the lookup joiner logic using a go map keyed on the span start and end keys. This works well for lookup joins that only use equalities (e.g. point-lookups), since in this case spans that overlap are exactly equivalent and can therefore be deduplicated by the map. For range lookups, however, spans may only partially overlap - for example, the start keys of two spans may be the same but the end keys different. Currently, we would not perform any de-duplication in this case and would instead perform two separate lookups, which may involve a lot of redundant work.

Describe the solution you'd like
First of all, de-duplication should be moved from the joinReader to the Streamer - see #82155.

For point lookups, it makes sense to use a map as before, since this is simple and imposes little overhead. However, for range lookups, a sort-merge strategy probably makes the most sense in order to eliminate redundant work. This work would only be performed once per input batch in order to de-duplicate. For matching and emitting looked-up rows to input rows, we would still use a simple multi-map slice like spanIDHelper.spanIDToInputRowIndices as before.

Additional context
#66002 and #85597 added support for range-based lookup joins, but lookup joins that don't use equality conditions (only inequalities) are still only planned in limited cases in order to avoid performance regressions due to the lack of de-duplication. Fixing this issue will allow us to plan lookup joins in more cases.

Jira issue: CRDB-18498

The text was updated successfully, but these errors were encountered:

github-actions · 2024-02-05T11:04:04Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

DrewKimball added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Aug 11, 2022

blathers-crl bot added the T-sql-queries SQL Queries Team label Aug 11, 2022

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023

github-actions bot added the no-issue-activity label Feb 5, 2024

DrewKimball removed the no-issue-activity label Feb 5, 2024

DrewKimball moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: make lookup-join deduplication efficient for range lookups #85950

sql: make lookup-join deduplication efficient for range lookups #85950

DrewKimball commented Aug 11, 2022 •

edited by cockroach-jira-scripts

Loading

github-actions bot commented Feb 5, 2024

sql: make lookup-join deduplication efficient for range lookups #85950

sql: make lookup-join deduplication efficient for range lookups #85950

Comments

DrewKimball commented Aug 11, 2022 • edited by cockroach-jira-scripts Loading

github-actions bot commented Feb 5, 2024

DrewKimball commented Aug 11, 2022 •

edited by cockroach-jira-scripts

Loading