-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Fix a bug in NN descent #1869
[BUG] Fix a bug in NN descent #1869
Conversation
Thank you, @divyegala and @RayWang96, for porting NN-descent to raft. Is this modification correct? |
I believe you are right. And I'm curious why it can pass the unit test with this bug. |
/ok to test |
@enp1s0 thanks for this PR. Could you post your NVCC and GCC versions in the description of the PR for hindsight? It seems like RAFT's CI NVCC+GCC are able to compile fine, so we should make note of what causes this compilation failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raft::device_*
variants of mdarray
do not have an initializer-list based constructor, so I prefer to keep brace initialization where possible
h_graph_new{nrow * num_samples}, | ||
h_list_sizes_new{nrow}, | ||
h_graph_old{nrow * num_samples}, | ||
h_dists(raft::make_host_matrix<DistData_t, size_t, raft::row_major>(nrow, node_degree)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can keep brace initializer
d_data_(raft::make_device_matrix<__half, Index_t, raft::row_major>( | ||
res, nrow_, build_config.dataset_dim)), | ||
l2_norms_(raft::make_device_vector<DistData_t, Index_t>(res, nrow_)), | ||
graph_buffer_( | ||
raft::make_device_matrix<ID_t, Index_t, raft::row_major>(res, nrow_, DEGREE_ON_DEVICE)), | ||
dists_buffer_( | ||
raft::make_device_matrix<DistData_t, Index_t, raft::row_major>(res, nrow_, DEGREE_ON_DEVICE)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
raft::make_device_matrix<DistData_t, Index_t, raft::row_major>(res, nrow_, DEGREE_ON_DEVICE)), | ||
graph_host_buffer_(nrow_ * DEGREE_ON_DEVICE), | ||
dists_host_buffer_(nrow_ * DEGREE_ON_DEVICE), | ||
d_locks_(raft::make_device_vector<int, Index_t>(res, nrow_)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
d_list_sizes_new_(raft::make_device_vector<int2, Index_t>(res, nrow_)), | ||
d_list_sizes_old_(raft::make_device_vector<int2, Index_t>(res, nrow_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
Thank you, @divyegala, @RayWang96, and @cjnolet, for reviewing the PR. I have modified the code based on the comments from @divyegala. The compilation of the unit test passes because it uses |
Thanks @enp1s0. Release 23.10 code freeze has passed, though, and the branch is being prepped for release. Can you base your branch on the 23.12? I'm in the process of fixing some merge conflicts to our forward merger, but we can get your changes merged shortly after. |
@enp1s0 I think it's quite likely that this issue is a result of your nvcc version. NN-Descent hard-codes internal index types to int32 so whether you use int64 or int32 index types for CAGRA it should not matter. Are you saying that the original code compiles with int64 index type for CAGRA with your nvcc version? |
259e5a4
to
5ee1976
Compare
Thank you, @divyegala, for your comment. I have found that the difference in the thrust versions is causing the compilation error. Raft uses thrust 1.17.2 while my program uses thrust 2.2, which is included in the CUDA 12.2 toolkit. And the list initializer was introduced in thrust 2.1 [release note]. So, this is not a bug. Is it better to close this PR or merge it for the future thrust version update used in raft? (cc: @cjnolet) |
@enp1s0 thanks for the investigations! We should merge this PR. Can you also add the thrust version to the description of this PR? |
/ok to test |
/ok to test |
/merge |
I got errors when compiling a program using raft NN-descent. This PR fixes the bug. ## Error e.g. ``` /home/.../include/raft/neighbors/detail/nn_descent.cuh(1158): error: invalid narrowing conversion from "unsigned long" to "int" h_rev_graph_old_{static_cast<size_t>(nrow_ * NUM_SAMPLES)}, ``` - nvcc ``` Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0 ``` - gcc ``` gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 ``` - thrust : 2.2 (This is the cause of this error [[detail](rapidsai#1869 (comment))]) # Change Use `Class(...)` instead of `Class{...}`. # Cause The NN-descent code calls constructors of `thrust::host_vector` as shown below: ```cpp graph_host_buffer_{static_cast<size_t>(nrow_ * DEGREE_ON_DEVICE)}, ``` However, this constructor is regarded as a list initialization. This is the same as the following code outputting 1 instead of 2. ```cpp #include <iostream> #include <vector> int main() { std::vector<float> list{2}; std::cout << list.size() << std::endl; } ``` [detail](https://en.cppreference.com/w/cpp/language/list_initialization) Authors: - tsuki (https://github.com/enp1s0) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) URL: rapidsai#1869
I got errors when compiling a program using raft NN-descent. This PR fixes the bug.
Error
e.g.
Change
Use
Class(...)
instead ofClass{...}
.Cause
The NN-descent code calls constructors of
thrust::host_vector
as shown below:However, this constructor is regarded as a list initialization.
This is the same as the following code outputting 1 instead of 2.
detail