Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT]: Place the existing key to the right-hand side during equality checks #474

Closed
PointKernel opened this issue May 3, 2024 · 0 comments · Fixed by #479
Closed
Labels
good first issue Good for newcomers P1: Should have Necessary but not critical type: improvement Improvement / enhancement to an existing function

Comments

@PointKernel
Copy link
Member

PointKernel commented May 3, 2024

Is your feature request related to a problem? Please describe.

cuco hash tables always place the slot key on the left-hand side for key equality checks:

* @note cuCollections data structures always place the slot keys on the left-hand side when
* invoking the key comparison predicate, i.e., `pred(slot_key, query_key)`. Order-sensitive
* `KeyEqual` should be used with caution.

This was a completely random choice when I started the open-addressing refactoring and I thought it didn't matter and was wrong.

Generally speaking, when we want to check if two variables are identical, we put the query value on the left-hand side and the "reference" or the existing value on the right-hand side. e.g. we do

if (idx == 0) { ... }

instead of

if (0 == idx) { ... }

The new cuco data structures are unfortunately following the latter pattern.

This works fine until we meet the hash join use case where the build table is the right table and the probe table is the left table. As a result, the left table is always on the right when doing comparisons in cuco while the right table is always on the left. In many places across libcudf, build_table/right_table and probe_table/left_table are interchangeable terms thus for a function join_func expecting the first argument to be the build table and the second argument to be the probe table, we may have to invoke it awkwardly:

void join_func(right_table, left_table, ...)

This must be stopped.

Describe the solution you'd like

Always place the existing value (either the sentinel value or the slot key) on the right side for equality checks.

e.g.

this->predicate_.operator()<is_insert::YES>(this->extract_key(slot_content), key);

We should do the following instead

this->predicate_.operator()<is_insert::YES>(key, this->extract_key(slot_content));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers P1: Should have Necessary but not critical type: improvement Improvement / enhancement to an existing function
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant