-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add bitset for ANN pre-filtering and deletion #1803
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love to see this functionality go in! This is something that we currently use in FIL, and I'd love to replace it with a RAFT version.
My general pieces of feedback would be:
- I'd like to make sure that we have a really good understanding of performance and how close to SoL this is. At minimum, adding benchmarks seems like it would be very worthwhile to allow for rapid perf iteration even after the initial PR.
- To the extent possible (and acknowledging the slight differences in purpose), matching the
std::bitset
API would make for a really nice developer experience. Little things like adding a.flip
method would save downstream users a lot of time. - Generalizing this as much as possible to different memory locations would also help with a lot of use cases that I can envision.
If we are to use this in FIL, we would also need to be able to compile with or without nvcc
, but it is fine if we want to leave that off the table for now.
cpp/include/raft/util/bitset.cuh
Outdated
* | ||
* @param bitset_span Device vector view of the bitset | ||
*/ | ||
_RAFT_HOST_DEVICE bitset_view(raft::device_vector_view<BitsetT, IdxT> bitset_span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like most of the logic is host-device compatible. Any reason why we restrict ourselves to an on-device bitset? I ask in particular because I can imagine several common workflows where we construct the bitset on host and then apply it on device by copying the underlying memory. As a matter of fact, this is exactly what we do in FIL.
Separately, it seems like a span, rather than an mdspan, is sufficient for what we're using it for. If we're going to use an mdspan anyway, is there a specific reason to restrict ourselves to 1D? If not, is there some other reason not to just accept a span?
cpp/include/raft/util/bitset.cuh
Outdated
res, raft::ceildiv(bitset_len, bitset_element_size))} | ||
{ | ||
RAFT_EXPECTS(mask_index.extent(0) <= bitset_len, "Mask index cannot be larger than bitset len"); | ||
cudaMemsetAsync(bitset_.data_handle(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more efficient to construct the data on host and then just transfer it all at once, rather than setting and unsetting the data like this? It would come at the cost of host memory footprint, but I suspect that it would be worth it for large bitsets.
Just to follow up on my previous review, I think all of the code details are in a good place, and we can follow up on any perf questions efficiently using the new benchmarks. My only remaining concern before merging an initial version of this is that it would be nice to make sure that the API does not change significantly in later follow-ups. To that end, it would be really great if we could match the |
The reasoning behind this was to have a stateless API as indicated in the dev guide. |
@lowener sorry to do this last minute, but I think stateful is okay for these types of reusable objects. We want statelessness across the APIs for compute functions but I agree with @wphicks that we should match existing standards as much as possible. The stl are generally very well thought out by large teams of c++ veterans and by following their interfaces, at the very least, we give our users compatbility and familiarity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good for a first pass and given the circumstances, I'm going to approve it for now. We can continue to revisit and iterate on it as it gets integrated more into the ANN filtering features.
core_math.rst | ||
core_bitset.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we continue to build out these filtering primitives, I think we need to consider consolidating them under a single page in the documentation, kind of like what we have done for the math primitives.
I don't think we need to do that right now, though. But something to think about- maybe something like "fast lookup primitives" or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are device functions for test/set/flip
going to be follow-up work?
*/ | ||
template <typename bitset_t = uint32_t, typename index_t = uint32_t> | ||
struct bitset_view { | ||
index_t static constexpr const bitset_element_size = sizeof(bitset_t) * 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
is redundant
index_t static constexpr const bitset_element_size = sizeof(bitset_t) * 8; | |
index_t static constexpr bitset_element_size = sizeof(bitset_t) * 8; |
inline _RAFT_HOST_DEVICE auto data_handle() -> bitset_t* { return bitset_ptr_; } | ||
inline _RAFT_HOST_DEVICE auto data_handle() const -> const bitset_t* { return bitset_ptr_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline _RAFT_HOST_DEVICE auto data_handle() -> bitset_t* { return bitset_ptr_; } | |
inline _RAFT_HOST_DEVICE auto data_handle() const -> const bitset_t* { return bitset_ptr_; } | |
inline _RAFT_HOST_DEVICE auto data() -> bitset_t* { return bitset_ptr_; } | |
inline _RAFT_HOST_DEVICE auto data() const -> const bitset_t* { return bitset_ptr_; } |
inline auto to_mdspan() -> raft::device_vector_view<bitset_t, index_t> | ||
{ | ||
return raft::make_device_vector_view<bitset_t, index_t>(bitset_ptr_, n_elements()); | ||
} | ||
inline auto to_mdspan() const -> raft::device_vector_view<const bitset_t, index_t> | ||
{ | ||
return raft::make_device_vector_view<const bitset_t, index_t>(bitset_ptr_, n_elements()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add two operators? inline auto operator raft::device_vector_view...
, this will make the conversion automatic
RAFT_BENCH_REGISTER(Uint8_32, "", bitset_input_vecs); | ||
RAFT_BENCH_REGISTER(Uint16_64, "", bitset_input_vecs); | ||
RAFT_BENCH_REGISTER(Uint32_32, "", bitset_input_vecs); | ||
RAFT_BENCH_REGISTER(Uint32_64, "", bitset_input_vecs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please post the benchmark results as a comment in your PR?
*/ | ||
template <typename bitset_t = uint32_t, typename index_t = uint32_t> | ||
struct bitset { | ||
index_t static constexpr const bitset_element_size = sizeof(bitset_t) * 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index_t static constexpr const bitset_element_size = sizeof(bitset_t) * 8; | |
index_t static constexpr bitset_element_size = sizeof(bitset_t) * 8; |
bitset_len_{bitset_len}, | ||
default_value_{default_value} | ||
{ | ||
cudaMemsetAsync(bitset_.data(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use cuda rt API directly
bitset(const raft::resources& res, | ||
raft::device_vector_view<const index_t, index_t> mask_index, | ||
index_t bitset_len, | ||
bool default_value = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use an enum here instead of boolean? Something like default_value::ALL_ONES
and default_value::ALL_ZEROS
*/ | ||
inline auto n_elements() const -> index_t | ||
{ | ||
return raft::ceildiv(bitset_len_, bitset_element_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return raft::ceildiv(bitset_len_, bitset_element_size); | |
return bitset_.size(); |
* @param bitset_len Length of the bitset | ||
* @param default_value Default value to set the bits to. Default is true. | ||
*/ | ||
bitset(const raft::resources& res, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not store this as a member so that none of the other functions need to accept raft::resources
on their API?
bitset_len_ = new_bitset_len; | ||
if (old_size < new_size) { | ||
// If the new size is larger, set the new bits to the default value | ||
cudaMemsetAsync(bitset_.data() + old_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use RAFT API
/merge |
Closes #1600. To be merged after #1803 and #1811 This PR adds `bitset_filter` to filter an index with a bitset. Authors: - Micka (https://github.com/lowener) - tsuki (https://github.com/enp1s0) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet)
PR based on the new Bitset feature (#1803) to support vector deletion in ANN. Closes #1177. Closes #1620. This PR adds `ivf_to_sample_filter` that acts as an intermediate filter to use an IVF index with a bitset filter. Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - William Hicks (https://github.com/wphicks) URL: #1831
PR based on the new Bitset feature (rapidsai#1803) to support vector deletion in ANN. Closes rapidsai#1177. Closes rapidsai#1620. This PR adds `ivf_to_sample_filter` that acts as an intermediate filter to use an IVF index with a bitset filter. Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - William Hicks (https://github.com/wphicks) URL: rapidsai#1831
Related to #1600