-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IVF-Flat reconstruction #1270
base: branch-23.08
Are you sure you want to change the base?
IVF-Flat reconstruction #1270
Changes from 41 commits
328a179
a83aca4
843904a
f09b3a0
fdc9395
93d5b35
5fcf564
c49bf67
6ba87d4
7e2d80b
74e0a8c
7b36742
135a9b6
81b2cbf
fd33dbc
539fbc5
31815d7
33bfb82
adb96e4
32936c9
23e0f84
acf1888
0283d25
f790abf
6a05196
eda7923
f0a7031
a6c54a4
85b4aa1
692af0d
d892ebb
a8b96a7
1870541
beb9264
2e8bc92
ebda975
81d51be
d250ae9
e9df346
f798657
080c784
0f6cdbe
cad88b3
6cd1abc
de6c13d
7064703
5e02d7f
87edda0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -464,6 +464,52 @@ void search(raft::resources const& handle, | |||||||||||||||||||||
nullptr); | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
/** | ||||||||||||||||||||||
* @brief Reconstruct vector database | ||||||||||||||||||||||
* | ||||||||||||||||||||||
* | ||||||||||||||||||||||
* @tparam T data element type | ||||||||||||||||||||||
* @tparam IdxT type of the indices | ||||||||||||||||||||||
* | ||||||||||||||||||||||
* @param[in] handle | ||||||||||||||||||||||
* @param[in] index ivf-flat constructed index | ||||||||||||||||||||||
* @param[in] vector_ids vector with the ids of vectors to look for | ||||||||||||||||||||||
* @param[out] vector_out matrix with the database of vectors | ||||||||||||||||||||||
*/ | ||||||||||||||||||||||
template <typename T, typename IdxT> | ||||||||||||||||||||||
void reconstruct_batch(raft::resources const& handle, | ||||||||||||||||||||||
const index<T, IdxT>& index, | ||||||||||||||||||||||
raft::device_vector_view<const IdxT, IdxT> vector_ids, | ||||||||||||||||||||||
raft::device_matrix_view<T, IdxT, row_major> vector_out) | ||||||||||||||||||||||
{ | ||||||||||||||||||||||
return raft::neighbors::ivf_flat::detail::reconstruct_batch( | ||||||||||||||||||||||
handle, index, vector_ids, vector_out); | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
/** | ||||||||||||||||||||||
* @brief Reconstruct vectors of a given cluster | ||||||||||||||||||||||
* | ||||||||||||||||||||||
* | ||||||||||||||||||||||
* @tparam T data element type | ||||||||||||||||||||||
* @tparam IdxT type of the indices | ||||||||||||||||||||||
* | ||||||||||||||||||||||
* @param[in] handle | ||||||||||||||||||||||
* @param[in] index ivf-flat constructed index | ||||||||||||||||||||||
* @param[out] vector_out matrix with the vectors contained in the cluster | ||||||||||||||||||||||
* @param[in] label cluster index | ||||||||||||||||||||||
* @param[in] offset offset | ||||||||||||||||||||||
*/ | ||||||||||||||||||||||
template <typename T, typename IdxT> | ||||||||||||||||||||||
void reconstruct_list_data(raft::resources const& handle, | ||||||||||||||||||||||
const index<T, IdxT>& index, | ||||||||||||||||||||||
device_matrix_view<T, IdxT, row_major> out_vectors, | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In Here in the ivf_flat API you use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, interesting! Out of curiosity, could you add a test case for ivf-flat with 64-bit indexing type and try to call something with an offset slightly outside the 32-bit range? :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A quick browsing confirms that 64-bit indexing/extents are actually not supported raft/cpp/include/raft/neighbors/ivf_flat-inl.cuh Lines 455 to 464 in cad88b3
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The portion of code you are mentioning is indeed probably a mistake. At the moment the IVF-Flat API functions are templated with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If ivf-flat can actually work with extents and input sizes |
||||||||||||||||||||||
uint32_t label, | ||||||||||||||||||||||
uint32_t offset) | ||||||||||||||||||||||
{ | ||||||||||||||||||||||
return raft::neighbors::ivf_flat::detail::reconstruct_list_data( | ||||||||||||||||||||||
handle, index, out_vectors, label, offset); | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
/** @} */ | ||||||||||||||||||||||
|
||||||||||||||||||||||
} // namespace raft::neighbors::ivf_flat |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
/* | ||
* Copyright (c) 2023, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
/* | ||
* NOTE: this file is generated by ivf_flat_00_generate.py | ||
* | ||
* Make changes there and run in this directory: | ||
* | ||
* > python ivf_flat_00_generate.py | ||
* | ||
*/ | ||
|
||
#include <raft/neighbors/ivf_flat-inl.cuh> | ||
|
||
#define instantiate_raft_neighbors_ivf_flat_reconstruct(T, IdxT) \ | ||
template void raft::neighbors::ivf_flat::reconstruct_batch<T, IdxT>( \ | ||
raft::resources const& handle, \ | ||
const raft::neighbors::ivf_flat::index<T, IdxT>& idx, \ | ||
raft::device_vector_view<const IdxT, IdxT> vector_ids, \ | ||
raft::device_matrix_view<T, IdxT, row_major> vector_out); \ | ||
\ | ||
template void raft::neighbors::ivf_flat::reconstruct_list_data<T, IdxT>( \ | ||
raft::resources const& handle, \ | ||
const raft::neighbors::ivf_flat::index<T, IdxT>& idx, \ | ||
raft::device_matrix_view<T, IdxT, row_major> out_vectors, \ | ||
uint32_t label, \ | ||
uint32_t offset); | ||
|
||
instantiate_raft_neighbors_ivf_flat_reconstruct(float, int64_t); | ||
|
||
#undef instantiate_raft_neighbors_ivf_flat_reconstruct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the consensus was to opt for a "inverted-index" module (hashmap/array/whatever) shared among ivf methods to convert user indices to (label, in-cluster-offset) pairs?
This code could break if the user adds non-contiguous range of large indices to the DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Yes absolutely, this is a temporary solution that would be largely improved by the use of a hashmap in a follow-up PR. But, I thought that letting this version as it is would set the API and allow people to run a reconstruction if their use case allows it (smaller index). But, can still remove it if that's the better path forward. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking of decoupling the ivf-list hashmap struct api from the ivf-pq and ivf-flat methods. Hence a user would need to construct the hashmap explicitly once (costly operation) and then either:
a) search-by-user-ids in two calls (e.g. hashmap::get_lists_offsets + index::get_vectors)
b) pass the hashmap as an argument to the ivf index methods (index::get_vectors)
Not sure what we will decide in the end, but point is the api may change as we progress with the hashmap. Then, maybe we just can keep this function in the
detail
namespace for now? Hence the work won't be lost and if someone needs the functionality urgently, they can use the detail function, and we don't need to break the public api in the follow-up PR.