-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom patch to support multi-vector #1
Conversation
can we add unit test for this change? |
8f89fbf
to
e678b78
Compare
faiss/impl/HNSW.cpp
Outdated
// Completed collection of result. Run post processor. | ||
collector->finalize(nres, I); | ||
// Collector completed its task. Release all resource of the collector. | ||
collectorFactory->deleteCollector(collector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? as these are classes their object should get destructed once these objects go out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends how resource is created. At least, the factory create collector with new
which need to be released manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
objects created with new
for sure gets deallocated. Its the pointers that create the issue. For pointers it is recommended to use unique_pointer.
e678b78
to
ffab86a
Compare
It will be covered by existing unit test. For some additional scenario where we are utilizing id mapper or extending result collector, I would like to write a unit test in OpenSearch side to reduce the patch size. |
make sense. So, can you paste the response of unit test runs, as the tests are not running as part of github actions. |
cb36f5a
to
c59bc53
Compare
Test result are in facebookresearch#3178 |
faiss/IndexIDMap.h
Outdated
@@ -10,6 +10,7 @@ | |||
#include <faiss/Index.h> | |||
#include <faiss/IndexBinary.h> | |||
#include <faiss/impl/IDSelector.h> | |||
#include <faiss/impl/ResultCollectorFactory.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this included here if its not used in the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me move it to cpp file.
faiss/impl/ResultCollector.h
Outdated
#include <faiss/MetricType.h> | ||
#include <faiss/utils/Heap.h> | ||
|
||
/** ResultCollector is intended to define how to collect search result */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more detail here? Maybe add an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
faiss/impl/ResultCollector.h
Outdated
|
||
/** Encapsulates a set of ids to handle. */ | ||
struct ResultCollector { | ||
// For each result, collect method is called to store result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add comment on what each of the args are for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
namespace faiss { | ||
|
||
/** ResultCollectorFactory to create a ResultCollector object */ | ||
struct ResultCollectorFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be overridden? How will this be customized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be overridden.
@@ -131,6 +150,10 @@ void IndexIDMapTemplate<IndexT>::search( | |||
sel_change.set(params_non_const, &this_idtrans); | |||
} | |||
} | |||
|
|||
if (params && params->col && !params->col->id_map) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the purpose of !params->col->id_map
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if id_map is already set, we don't want to replace it with a new one which is same implementation as id selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if ID is already set, then are we not setting col_change
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col_change is just a method to update id_map inside parameter. If id_map is already there, no need to call the method.
// RAII object to reset the id_map parameter in ResultCollectorFactory object | ||
// This object make sure to reset the id_map parameter in ResultCollectorFactory | ||
// once the program exist current method scope. | ||
struct ScopedColChange { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have any threading implications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no.
faiss/impl/HNSW.cpp
Outdated
@@ -111,8 +112,8 @@ void HNSW::print_neighbor_stats(int level) const { | |||
level, | |||
nb_neighbors(level)); | |||
size_t tot_neigh = 0, tot_common = 0, tot_reciprocal = 0, n_node = 0; | |||
#pragma omp parallel for reduction(+: tot_neigh) reduction(+: tot_common) \ | |||
reduction(+: tot_reciprocal) reduction(+: n_node) | |||
#pragma omp parallel for reduction(+ : tot_neigh) reduction(+ : tot_common) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revert this if we are not touching it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't changed it. Let me rebase the code.
c59bc53
to
231b522
Compare
faiss/impl/ResultCollectorFactory.h
Outdated
const std::vector<int64_t>* id_map; | ||
|
||
// Create a new ResultCollector object | ||
virtual ResultCollector* newCollector() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should methods be snake_case
instead of camelCase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C++, the convention for method names is typically camel case.
* @param bh_val search result, distances from query | ||
* @param bh_ids search result, ids of vectors | ||
* @param val distance from query for current vector | ||
* @param ids id of current vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name as id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep the parameter name consistent with faiss original code.
231b522
to
70c9b5d
Compare
…Lists (facebookresearch#3327) Summary: Pull Request resolved: facebookresearch#3327 **Context** 1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards. **In this diff**, 1. To address #1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Reviewed By: mdouze Differential Revision: D55482518 fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083
…ookresearch#3527) Summary: Pull Request resolved: facebookresearch#3527 **Context** Design Doc: [Faiss Benchmarking](https://docs.google.com/document/d/1c7zziITa4RD6jZsbG9_yOgyRjWdyueldSPH6QdZzL98/edit) **In this diff** 1. Be able to reference codec and index from blobstore (bucket & path) outside the experiment 2. To support #1, naming is moved to descriptors. 3. Build index can be written as well. 4. You can run benchmark with train and then refer it in index built and then refer index built in knn search. Index serialization is optional. Although not yet exposed through index descriptor. 5. Benchmark can support index with different datasets sizes 6. Working with varying dataset now support multiple ground truth. There may be small fixes before we could use this. 7. Added targets for bench_fw_range, ivf, codecs and optimize. **Analysis of ivf result**: D58823037 Reviewed By: algoriddle Differential Revision: D57236543 fbshipit-source-id: ad03b28bae937a35f8c20f12e0a5b0a27c34ff3b
Temporary PR to make it easy to review opensearch-project/k-NN#1358