-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CAGRA support with latest RAFT #175
Conversation
Update to RAFT 23.12 Update CAGRA integration to improve performance Avoid post-filtering using RAFT's new filtering feature Use RAFT's new device_resources_manager to simplify and optimize resource initialization Update build infratructure to build for all supported CUDA architectures Refactor RAFT integration code to more cleanly separate RAFT code from Knowhere code Avoid exposing RAFT symbols in any Knowhere header Signed-off-by: William Hicks <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: wphicks The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @wphicks! It looks like this is your first PR to zilliztech/knowhere 🎉 |
@wphicks 🔍 Important: PR Classification Needed! For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:
For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”. Thanks for your efforts and contribution to the community!. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some explanatory comments inline where I expect there to be questions about specific changes.
@@ -12,19 +12,28 @@ | |||
# License for the specific language governing permissions and limitations under | |||
# the License | |||
|
|||
cmake_minimum_required(VERSION 3.23.0 FATAL_ERROR) | |||
project(knowhere CXX C) | |||
cmake_minimum_required(VERSION 3.26.4 FATAL_ERROR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required for RAPIDS CMake used in RAFT 23.12.
|
||
set(CMAKE_EXPORT_COMPILE_COMMANDS ON) | ||
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules/") | ||
include(GNUInstallDirs) | ||
include(ExternalProject) | ||
include(cmake/utils/utils.cmake) | ||
|
||
knowhere_option(WITH_RAFT "Build with RAFT indexes" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this up because CMAKE_CUDA_ARCHITECTURES needs to be filled in before initializing the project.
test_cagra(const knowhere::Json& cfg) { | ||
auto conf = cfg; | ||
|
||
auto find_smallest_max_iters = [&](float expected_recall) -> int32_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding the best max_iterations
has higher impact than searching over itopk
@@ -0,0 +1,125 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is the header actually included elsewhere in Knowhere. It exposes no RAFT symbols and does not require CUDA compilation.
@@ -0,0 +1,292 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file provides a generic template for Knowhere indexes based on RAFT.
class RaftIvfFlatConfig : public IvfFlatConfig { | ||
public: | ||
struct GpuRaftIvfPqConfig : public IvfPqConfig { | ||
CFG_FLOAT refine_ratio; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This newly-introduced parameter allows additional refinement after an initial selection of candidates from an index search.
return json; | ||
}; | ||
|
||
auto refined_gen = [](auto&& upstream_gen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helper for generating identical configurations with refinement.
Signed-off-by: William Hicks <[email protected]>
Signed-off-by: William Hicks <[email protected]>
Signed-off-by: William Hicks <[email protected]>
e988f47
to
075c4e2
Compare
Signed-off-by: William Hicks <[email protected]>
This PR brings in the latest features from RAFT and significantly refactors the RAFT integration code. The primary goal of this refactor is to more clearly separate Knowhere code from RAFT integration code from RAFT itself. This leads to three layers in the updated integration:
CAGRA benchmarks have been substantially simplified in this PR and should run significantly faster. Throughput for batch size 1 is still not as high as CAGRA potentially allows, but it is significantly higher than previous benchmarks. Performance is currently bottlenecked on many small host-to-device transfers, but this can be improved in a follow-up PR. Throughput for larger batch sizes is substantially improved, with a median 17% overhead relative to raw RAFT calls during testing.
Given the significant scope of this PR, I will add some comments in-line, but here is the overall summary of changes:
NOTE: This PR currently points to a fork of RAFT while waiting for rapidsai/raft#1831 to merge. This was impacted by today's GIthub outage. Before merging, we should shift back to the main RAFT repo.
Close #176