Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add bitset for ANN pre-filtering and deletion #1803

Merged
merged 21 commits into from
Sep 26, 2023

Conversation

lowener
Copy link
Contributor

@lowener lowener commented Sep 4, 2023

Related to #1600

@lowener lowener added 3 - Ready for Review non-breaking Non-breaking change feature request New feature or request labels Sep 5, 2023
@lowener lowener marked this pull request as ready for review September 5, 2023 00:49
@lowener lowener requested review from a team as code owners September 5, 2023 00:49
Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love to see this functionality go in! This is something that we currently use in FIL, and I'd love to replace it with a RAFT version.

My general pieces of feedback would be:

  • I'd like to make sure that we have a really good understanding of performance and how close to SoL this is. At minimum, adding benchmarks seems like it would be very worthwhile to allow for rapid perf iteration even after the initial PR.
  • To the extent possible (and acknowledging the slight differences in purpose), matching the std::bitset API would make for a really nice developer experience. Little things like adding a .flip method would save downstream users a lot of time.
  • Generalizing this as much as possible to different memory locations would also help with a lot of use cases that I can envision.

If we are to use this in FIL, we would also need to be able to compile with or without nvcc, but it is fine if we want to leave that off the table for now.

cpp/include/raft/util/bitset.cuh Outdated Show resolved Hide resolved
*
* @param bitset_span Device vector view of the bitset
*/
_RAFT_HOST_DEVICE bitset_view(raft::device_vector_view<BitsetT, IdxT> bitset_span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like most of the logic is host-device compatible. Any reason why we restrict ourselves to an on-device bitset? I ask in particular because I can imagine several common workflows where we construct the bitset on host and then apply it on device by copying the underlying memory. As a matter of fact, this is exactly what we do in FIL.

Separately, it seems like a span, rather than an mdspan, is sufficient for what we're using it for. If we're going to use an mdspan anyway, is there a specific reason to restrict ourselves to 1D? If not, is there some other reason not to just accept a span?

cpp/include/raft/util/bitset.cuh Outdated Show resolved Hide resolved
cpp/include/raft/util/bitset.cuh Outdated Show resolved Hide resolved
cpp/include/raft/util/bitset.cuh Outdated Show resolved Hide resolved
cpp/include/raft/util/bitset.cuh Outdated Show resolved Hide resolved
cpp/include/raft/util/bitset.cuh Outdated Show resolved Hide resolved
res, raft::ceildiv(bitset_len, bitset_element_size))}
{
RAFT_EXPECTS(mask_index.extent(0) <= bitset_len, "Mask index cannot be larger than bitset len");
cudaMemsetAsync(bitset_.data_handle(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more efficient to construct the data on host and then just transfer it all at once, rather than setting and unsetting the data like this? It would come at the cost of host memory footprint, but I suspect that it would be worth it for large bitsets.

cpp/include/raft/util/bitset.cuh Outdated Show resolved Hide resolved
cpp/test/util/bitset.cu Outdated Show resolved Hide resolved
@wphicks
Copy link
Contributor

wphicks commented Sep 19, 2023

Just to follow up on my previous review, I think all of the code details are in a good place, and we can follow up on any perf questions efficiently using the new benchmarks. My only remaining concern before merging an initial version of this is that it would be nice to make sure that the API does not change significantly in later follow-ups. To that end, it would be really great if we could match the std::bitset API as much as possible, with any deviations justified by the slight difference in purpose or the need for device-side support. Having e.g. flip, set, reset, etc. methods available on the object itself would make for a very easy transition for folks who are already familiar with std::bitset.

@lowener
Copy link
Contributor Author

lowener commented Sep 19, 2023

The reasoning behind this was to have a stateless API as indicated in the dev guide.
Should I do an exception to match std::bitset API?

@cjnolet
Copy link
Member

cjnolet commented Sep 20, 2023

The reasoning behind this was to have a stateless API as indicated in the dev guide.
Should I do an exception to match std::bitset API?

@lowener sorry to do this last minute, but I think stateful is okay for these types of reusable objects. We want statelessness across the APIs for compute functions but raft::bitset is becoming a pretty core API (like mdspan or mdarray, for example) and I think maintaining state in the bitset is okay. I also think we should move the bitset into raft/core. That was my fault- I thought initially it would be a couple utility functions but now it's evolved into a first-class construct, which is great.

I agree with @wphicks that we should match existing standards as much as possible. The stl are generally very well thought out by large teams of c++ veterans and by following their interfaces, at the very least, we give our users compatbility and familiarity.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good for a first pass and given the circumstances, I'm going to approve it for now. We can continue to revisit and iterate on it as it gets integrated more into the ANN filtering features.

core_math.rst
core_bitset.rst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we continue to build out these filtering primitives, I think we need to consider consolidating them under a single page in the documentation, kind of like what we have done for the math primitives.

I don't think we need to do that right now, though. But something to think about- maybe something like "fast lookup primitives" or something.

Copy link
Member

@divyegala divyegala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are device functions for test/set/flip going to be follow-up work?

*/
template <typename bitset_t = uint32_t, typename index_t = uint32_t>
struct bitset_view {
index_t static constexpr const bitset_element_size = sizeof(bitset_t) * 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const is redundant

Suggested change
index_t static constexpr const bitset_element_size = sizeof(bitset_t) * 8;
index_t static constexpr bitset_element_size = sizeof(bitset_t) * 8;

Comment on lines +76 to +77
inline _RAFT_HOST_DEVICE auto data_handle() -> bitset_t* { return bitset_ptr_; }
inline _RAFT_HOST_DEVICE auto data_handle() const -> const bitset_t* { return bitset_ptr_; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline _RAFT_HOST_DEVICE auto data_handle() -> bitset_t* { return bitset_ptr_; }
inline _RAFT_HOST_DEVICE auto data_handle() const -> const bitset_t* { return bitset_ptr_; }
inline _RAFT_HOST_DEVICE auto data() -> bitset_t* { return bitset_ptr_; }
inline _RAFT_HOST_DEVICE auto data() const -> const bitset_t* { return bitset_ptr_; }

Comment on lines +91 to +98
inline auto to_mdspan() -> raft::device_vector_view<bitset_t, index_t>
{
return raft::make_device_vector_view<bitset_t, index_t>(bitset_ptr_, n_elements());
}
inline auto to_mdspan() const -> raft::device_vector_view<const bitset_t, index_t>
{
return raft::make_device_vector_view<const bitset_t, index_t>(bitset_ptr_, n_elements());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add two operators? inline auto operator raft::device_vector_view..., this will make the conversion automatic

Comment on lines +69 to +72
RAFT_BENCH_REGISTER(Uint8_32, "", bitset_input_vecs);
RAFT_BENCH_REGISTER(Uint16_64, "", bitset_input_vecs);
RAFT_BENCH_REGISTER(Uint32_32, "", bitset_input_vecs);
RAFT_BENCH_REGISTER(Uint32_64, "", bitset_input_vecs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please post the benchmark results as a comment in your PR?

*/
template <typename bitset_t = uint32_t, typename index_t = uint32_t>
struct bitset {
index_t static constexpr const bitset_element_size = sizeof(bitset_t) * 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
index_t static constexpr const bitset_element_size = sizeof(bitset_t) * 8;
index_t static constexpr bitset_element_size = sizeof(bitset_t) * 8;

bitset_len_{bitset_len},
default_value_{default_value}
{
cudaMemsetAsync(bitset_.data(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use cuda rt API directly

bitset(const raft::resources& res,
raft::device_vector_view<const index_t, index_t> mask_index,
index_t bitset_len,
bool default_value = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use an enum here instead of boolean? Something like default_value::ALL_ONES and default_value::ALL_ZEROS

*/
inline auto n_elements() const -> index_t
{
return raft::ceildiv(bitset_len_, bitset_element_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return raft::ceildiv(bitset_len_, bitset_element_size);
return bitset_.size();

* @param bitset_len Length of the bitset
* @param default_value Default value to set the bits to. Default is true.
*/
bitset(const raft::resources& res,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not store this as a member so that none of the other functions need to accept raft::resources on their API?

bitset_len_ = new_bitset_len;
if (old_size < new_size) {
// If the new size is larger, set the new bits to the default value
cudaMemsetAsync(bitset_.data() + old_size,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RAFT API

@divyegala
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit b9b5f44 into rapidsai:branch-23.10 Sep 26, 2023
54 checks passed
raydouglass pushed a commit that referenced this pull request Oct 3, 2023
Closes #1600.
To be merged after #1803 and #1811
This PR adds `bitset_filter` to filter an index with a bitset.

Authors:
   - Micka (https://github.com/lowener)
   - tsuki (https://github.com/enp1s0)
   - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
   - Corey J. Nolet (https://github.com/cjnolet)
rapids-bot bot pushed a commit that referenced this pull request Nov 6, 2023
PR based on the new Bitset feature (#1803) to support vector deletion in ANN.
Closes #1177. 
Closes #1620.
This PR adds `ivf_to_sample_filter` that acts as an intermediate filter to use an IVF index with a bitset filter.

Authors:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - William Hicks (https://github.com/wphicks)

URL: #1831
benfred pushed a commit to benfred/raft that referenced this pull request Nov 8, 2023
PR based on the new Bitset feature (rapidsai#1803) to support vector deletion in ANN.
Closes rapidsai#1177. 
Closes rapidsai#1620.
This PR adds `ivf_to_sample_filter` that acts as an intermediate filter to use an IVF index with a bitset filter.

Authors:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#1831
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

5 participants