Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT]: Consider removing the use of argument_type and result_type from hasher in cuco default_filter_policy #653

Open
mhaseeb123 opened this issue Dec 10, 2024 · 5 comments
Labels
P1: Should have Necessary but not critical topic: bloom_filter Issues related to bloom_filter type: improvement Improvement / enhancement to an existing function

Comments

@mhaseeb123
Copy link
Contributor

Is your feature request related to a problem? Please describe.

The member type std::hash::argument_type is deprecated in C++17 and removed in C++20, so we should consider syncing with STL and remove them from our hashers as well as their use in bloom filter policies.

Ref: https://en.cppreference.com/w/cpp/utility/hash

Originally posted by @bdice in rapidsai/cudf#17289 (comment)

Describe the solution you'd like

Consider refactoring to not use hasher::argument_type and hasher::result_type across cuCollections.

Describe alternatives you've considered

Keep using the deprecated hasher type aliases until C++20

Additional context

No response

@mhaseeb123 mhaseeb123 changed the title [ENHANCEMENT]: Consider removing the use of hasher::argument_type and hasher::result_type in cuCo default and arrow filter policies [ENHANCEMENT]: Consider removing the use of argument_type and result_type from hasher in cuo default and arrow filter policies Dec 10, 2024
@PointKernel
Copy link
Member

I thought about mentioning this during the code review but didn’t want to come across as nitpicking. Instead of

template <class Key, class XXHash64>
class arrow_filter_policy {

we could make the hasher a template type:

template<class Key, template <typename> class Hash>
class arrow_filter_policy {
  ...
};

To use it, users can simply pass the hasher type without repeating the Key type again, e.g.

auto p = arrow_filter_policy<Key, cuco::xxhash_32>{...};

@PointKernel PointKernel added topic: bloom_filter Issues related to bloom_filter type: improvement Improvement / enhancement to an existing function P1: Should have Necessary but not critical labels Dec 10, 2024
@mhaseeb123
Copy link
Contributor Author

mhaseeb123 commented Dec 10, 2024

I thought about mentioning this during the code review but didn’t want to come across as nitpicking.

No worries about that. I am always for pursuing excellence and attention to the detail is how we all learn. 😄

I think this is a great idea. Let's do this. But we still would be using the member types (argument_type and result_type) from the hasher which Bradley mentioned

@PointKernel
Copy link
Member

PointKernel commented Dec 10, 2024

we still would be using the member types (argument_type and result_type) from the hasher

Half-half. The argument_type will simply be key_type (using key_type = Key). However, retaining result_type (should be hash_value_type of the policy) is still desirable. The STL simplifies hashers by always using std::size_t as the return type, but this approach is suboptimal, especially in terms of performance and for GPUs. Such oversimplification does not fit our goals and IMO should be avoided.

@mhaseeb123
Copy link
Contributor Author

Such oversimplification does not fit our goals and IMO should be avoided.

Agreed. Let's make the suggested change to cuco::arrow_filter_policy and close this issue otherwise.

@mhaseeb123 mhaseeb123 changed the title [ENHANCEMENT]: Consider removing the use of argument_type and result_type from hasher in cuo default and arrow filter policies [ENHANCEMENT]: Consider removing the use of argument_type and result_type from hasher in cuco default_filter_policy Dec 10, 2024
@mhaseeb123
Copy link
Contributor Author

#654 removes these member types for the arrow_filter_policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: Should have Necessary but not critical topic: bloom_filter Issues related to bloom_filter type: improvement Improvement / enhancement to an existing function
Projects
None yet
Development

No branches or pull requests

2 participants