Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IVF-PQ: manipulating individual lists #1298
IVF-PQ: manipulating individual lists #1298
Changes from 8 commits
d421375
c7b5574
087919f
b5e9844
ea81c1e
6567f5c
3ccd48f
fdb8a6a
56ffc84
ebd1b1c
2773a86
82be40c
5fc1c53
ddc4466
56eb716
3a2c622
e7c35e7
839cd48
46d5a84
9fc9818
2f1f6f7
ac8aa18
d5b2d1b
f62273d
38895bb
a674768
1be2f6b
d6cad17
2625666
172673b
1504027
1326ac7
08cfacb
a399023
d935568
d1dd238
5514d7d
a317622
9367936
09f1a0d
b7e811a
926f510
8e311d9
158189c
280c040
a7e5419
09abadb
923c3b5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjnolet , could you please check if this interface is suitable for integration with FAISS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple important points that the FAISS developers have pointed out:
reconstruct_batch
API which accepts a collection of ids (which are not necessarily contiguous) and reconstructs those vectors.From what I can tell, the API you are proposing here doesn't implement the above, so I would say it's not suitable for the FAISS integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick clarification: the ids we should pass are (A) the same indices we pass during construction or (B) internal ids of the ivf-pq index?
If (B) then should these be a pair (label, collection of in-cluster-ids), or a collection of (label, in-cluster-id)? The former, perhaps can be slightly faster.
If (A) then I hardly can imagine an efficient function that can do this: we don't sort or organize the input indices in any way, so a full search in the database needs to be done for every index. Perhaps we can split the task in two: search indices -> then call interface (B)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also don't you need something like this interface to be able to convert our GPU index to their CPU index type (more-or-less efficiently, converting one cluster at a time)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a couple different asks here. We have an ask to be able to convert from one interleaved (and potentially non-interleaved) encoded format to another, which is required for being able to convert between cpu and gpu indices. That ask is for us to have functions available that can be used by their CodePacker API.
In addition to this, there's the ask to be able to reconstruct a batch of vectors given a set of indices, which might not be in order. This is the corresponding issue for that ask (which is directly from Jeff Johnson). The indices passed into the
reconstruct_batch
would also coincide w/ the indices which would be specified when invokingextend
here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now I see what you mean. I agree, we should export
detail::pack_list_data
anddetail::unpack_list_data
into a public namespace, to make sure we don't change their signature and break faiss integration by accident. I'm also not opposed to the idea of moving the construct/reconstruct and other similar functions to theivf_pq::helpers
public namespace.The only thing that bothers me, is that the overloads we need for the codepacker go really against the current design of the public api. Maybe, as an exception, we export these two into something like
ivf_pq::codepacker
to discourage other users from using them? Or even implement the whole codepacker api in raft in that namespace?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't think this is a good way to approach any public API. Either the API is public and users can use it, or it's not and they should touch that code at their own risk. Trying to force this middle ground for some users and not others is not sustainable and I'm not sure why there's a problem just exposing it outright.
I also don't agree that these functions should not just be part of the public API- if users need the ability to call a function, it needs to wrapped into the public API so it can be clearly documented, not just exported.
We had tried the exporting initially when we first started exposing formal public APIs and that was a mistake. Anything the user should be interacting directly with should be available and clearly documented (which for most almost all of the functions in RAFT just means wrapped).
I'm not opposed to using a nested "CodePacker" namespace but I'd also prefer that it is also nested inside helpers, so we be consistent there wrt the standard API namespaces and the "extra" functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a public API for pack_list_data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, for the codepacker, we'd probably need to expose this overload
raft/cpp/include/raft/neighbors/detail/ivf_pq_build.cuh
Lines 575 to 580 in 926f510
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have missed this message before. I have exposed the pack/unpack versions that take list_data directly under
ivf_pq::helpers::codepacker
namespace.In our implementation we still need to provide the
pq_bits
parameter. Otherwise it is very similar to the CodePacker API.One question: what does
block
refer to in CodePacker? In our API it refers to a set of interleaved vectors, which can be the data for a whole cluster, or less.