[ENHANCEMENT]: Place the existing key to the right-hand side during equality checks #474
Labels
good first issue
Good for newcomers
P1: Should have
Necessary but not critical
type: improvement
Improvement / enhancement to an existing function
Is your feature request related to a problem? Please describe.
cuco hash tables always place the slot key on the left-hand side for key equality checks:
cuCollections/include/cuco/static_map.cuh
Lines 64 to 66 in 6cb6dbf
This was a completely random choice when I started the open-addressing refactoring and I thought it didn't matter and was wrong.
Generally speaking, when we want to check if two variables are identical, we put the query value on the left-hand side and the "reference" or the existing value on the right-hand side. e.g. we do
instead of
The new cuco data structures are unfortunately following the latter pattern.
This works fine until we meet the hash join use case where the build table is the right table and the probe table is the left table. As a result, the left table is always on the right when doing comparisons in cuco while the right table is always on the left. In many places across libcudf,
build_table
/right_table
andprobe_table
/left_table
are interchangeable terms thus for a functionjoin_func
expecting the first argument to be the build table and the second argument to be the probe table, we may have to invoke it awkwardly:This must be stopped.
Describe the solution you'd like
Always place the existing value (either the sentinel value or the slot key) on the right side for equality checks.
e.g.
cuCollections/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh
Line 370 in 6cb6dbf
We should do the following instead
The text was updated successfully, but these errors were encountered: