[FEA] Improve CAGRA serialization #1729

lowener · 2023-08-09T12:04:16Z

CAGRA should expose its data type at the beginning of the serialization file for easier deserialization on Python side, just like IVF Flat.
#1717 is introducing a temporary fix on the Python side due to the dtype not being at a fixed offset in the serialization file.

This changes the serialization format of saved CAGRA instances by: * The dtype will now be written in the first 4 bytes of the index, to match the IVF methods and to make it easier to deduce the dtype from python (rapidsai#1729) * Writing out the dataset with the index is now optional. Since many use cases will already have the dataset written out separately, this gives us the option to save disk space by not writing out an extra copy of the input dataset. If the include_dataset=false option is given, you will have to call `index.update_dataset` to set the dataset yourself after loading

This changes the serialization format of saved CAGRA indices by: * The dtype will now be written in the first 4 bytes of the serialized file, to match the IVF methods and to make it easier to deduce the dtype from python (#1729) * Writing out the dataset with the index is now optional. Since many use cases will already have the dataset written out separately, this gives us the option to save disk space by not writing out an extra copy of the input dataset. If the include_dataset=false option is given, you will have to call `index.update_dataset` to set the dataset yourself after loading Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1755

lowener added the feature request New feature or request label Aug 9, 2023

lowener added this to VS/ML/DM Primitives Release Board Aug 9, 2023

lowener moved this to Todo in VS/ML/DM Primitives Release Board Aug 9, 2023

benfred mentioned this issue Aug 18, 2023

Update CAGRA serialization #1755

Merged

benfred linked a pull request Aug 18, 2023 that will close this issue

Update CAGRA serialization #1755

Merged

rapids-bot bot closed this as completed in #1755 Aug 21, 2023

github-project-automation bot moved this from Todo to Done in VS/ML/DM Primitives Release Board Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Improve CAGRA serialization #1729

[FEA] Improve CAGRA serialization #1729

lowener commented Aug 9, 2023

[FEA] Improve CAGRA serialization #1729

[FEA] Improve CAGRA serialization #1729

Comments

lowener commented Aug 9, 2023