-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Improve CAGRA serialization #1729
Labels
feature request
New feature or request
Comments
benfred
added a commit
to benfred/raft
that referenced
this issue
Aug 18, 2023
This changes the serialization format of saved CAGRA instances by: * The dtype will now be written in the first 4 bytes of the index, to match the IVF methods and to make it easier to deduce the dtype from python (rapidsai#1729) * Writing out the dataset with the index is now optional. Since many use cases will already have the dataset written out separately, this gives us the option to save disk space by not writing out an extra copy of the input dataset. If the include_dataset=false option is given, you will have to call `index.update_dataset` to set the dataset yourself after loading
rapids-bot bot
pushed a commit
that referenced
this issue
Aug 21, 2023
This changes the serialization format of saved CAGRA indices by: * The dtype will now be written in the first 4 bytes of the serialized file, to match the IVF methods and to make it easier to deduce the dtype from python (#1729) * Writing out the dataset with the index is now optional. Since many use cases will already have the dataset written out separately, this gives us the option to save disk space by not writing out an extra copy of the input dataset. If the include_dataset=false option is given, you will have to call `index.update_dataset` to set the dataset yourself after loading Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1755
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
CAGRA should expose its data type at the beginning of the serialization file for easier deserialization on Python side, just like IVF Flat.
#1717 is introducing a temporary fix on the Python side due to the dtype not being at a fixed offset in the serialization file.
The text was updated successfully, but these errors were encountered: