-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Pickle Approximate NearestNeighbors models #4743
Comments
Thank you for spotting that. Unfortunately, it looks like there is no simple solution for this right now. Indeed the knnIndex struct contains GPU resources handled by FAISS. However, if we develop our own ANN algorithms it might become easier to serialize the necessary data though. |
Yes, I agree. We can return to this question after rapidsai/raft#652 |
This issue has been labeled |
This issue has been labeled |
This PR implements serialization to file for `ivf_pq::index` and `ivf_flat::index` structures. Index building takes time, therefore downstream projects (like cuML) want to save the index (rapidsai/cuml#4743). But downstream project should not depend on the implementation details of the index, therefore RAFT provides methods to serialize and deserialize the index. This is still experimental: - ideally we want to use a general serialization method for mdspan #770, - instead of directly saving to file, raft should provide a byte string and let the downstream project decide how to save it (e.g. pickle for cuML). Python wrappers are provided for IVF-PQ to save/load the index. Authors: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #919
Describe the bug
Approximate nearest neighbor models (
'ivfflat
','ivfpq'
) store their state is aknnIndex
object. Currently there is no support to pickle models that were fitted using these algorithms. The error only shows while predicting with the loaded model.Steps/Code to reproduce bug
Now start a new process (e.g. new Jupyter kernel)
This will result in the process dying. This is probably due to accessing the model state through
knnIndex
pointer, which was just saved/restored as int values, but does not point to a valid object if the process is restarted. (One can see this by observing the'knn_index'
value in the dict returned bymodel.__getstate__()
).Expected behavior
Pickling and loading the model shall work. To achieve this ANN models need to serialize / deserialize their
knnIndex
object while pickling the model.Environment details (please complete the following information):
The text was updated successfully, but these errors were encountered: