Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Arrow bloom filter policy #625

Merged
merged 36 commits into from
Oct 30, 2024

Conversation

mhaseeb123
Copy link
Contributor

@mhaseeb123 mhaseeb123 commented Oct 25, 2024

This PR adds a new Bloom Filter policy implementing the Arrow BF algorithm. This PR is a part of rapidsai/cudf#17164. A follow-up PR will add tests for bitwise validation of bloom filter using arrow policy.

Copy link

copy-pr-bot bot commented Oct 25, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mhaseeb123 mhaseeb123 changed the title [WIP] Add Arrow bloom filter policy [WIP] 🚧 Add Arrow bloom filter policy Oct 25, 2024
@PointKernel PointKernel added In Progress Currently a work in progress type: feature request New feature request helps: rapids Helps or needed by RAPIDS labels Oct 25, 2024
@mhaseeb123
Copy link
Contributor Author

Can you please fix the doxygen check failures as well?

All done!

examples/bloom_filter/host_bulk_arrow_policy_example.cu Outdated Show resolved Hide resolved
include/cuco/detail/bloom_filter/arrow_filter_policy.cuh Outdated Show resolved Hide resolved
examples/CMakeLists.txt Outdated Show resolved Hide resolved
include/cuco/detail/bloom_filter/arrow_filter_policy.cuh Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

/ok to test

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@mhaseeb123
Copy link
Contributor Author

mhaseeb123 commented Oct 29, 2024

Removed the new example and added a @code segment in arrow_filter_policy.cuh docstring to demo usage.

Copy link
Collaborator

@sleeepyjack sleeepyjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Awesome work!

@PointKernel
Copy link
Member

/ok to test

@sleeepyjack sleeepyjack merged commit 317c273 into NVIDIA:dev Oct 30, 2024
18 checks passed
@mhaseeb123 mhaseeb123 deleted the fea/impl-arrow-bf-policy branch October 30, 2024 17:25
sleeepyjack pushed a commit that referenced this pull request Nov 1, 2024
This PR adds a tests to validate the bitset from inserting specific keys
to a `cuco::bloom_filter` with `cuco::arrow_filter_policy` against the
one generated by inserting the same keys to the implementation in Arrow.

Related to #625. Part of rapidsai/cudf#17164.
Reference bitset gen with arrow here: https://godbolt.org/z/ebdddezbP

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helps: rapids Helps or needed by RAPIDS Needs Review Awaiting reviews before merging type: feature request New feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants