-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add python serialization API's for ivf-pq and ivf_flat #186
Add python serialization API's for ivf-pq and ivf_flat #186
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have some few comments on a first pass, PR looks great
@@ -28,7 +28,7 @@ def get_last_error_text(): | |||
if c_err is NULL: | |||
return | |||
cdef bytes err = c_err | |||
return err.decode("utf8") | |||
return err.decode("utf8", "ignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why was this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats a great question!
So I originally had a bug in the ivf_pq deserialization code, that resulted in some bad data being passed to the mdspan serializer, which caused an error to be thrown that wasn't valid utf8 text.
Specifically this line https://github.com/rapidsai/raft/blob/b66b269ab6dcda48aef3a6ed9e7f604e99471d72/cpp/include/raft/core/detail/mdspan_numpy_serializer.hpp#L293 was writing out the error message unrecognized byteorder %c
where the %c
was pointing to some random data (\x93
in my case) - which couldn't get converted to utf8. This led to a unicode decode error being thrown rather than the actual error.
While I think we might want to consider always having error messages that are valid utf8 text (like converting that %c
format to something like %x
) - I don't think we can guarantee that the exception text from our c++ layer is valid utf8, and ignoring any conversion errors results in a better error message than failing on the utf8 decode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes a lot of sense, and it's great to know! Thanks for the detailed explanation :)
python/cuvs/cuvs/test/test_ivf_pq.py
Outdated
@@ -166,3 +166,45 @@ def test_ivf_pq_search_params(params): | |||
lut_dtype=params["lut"], | |||
internal_distance_dtype=params["idd"], | |||
) | |||
|
|||
|
|||
def test_save_load(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A significant portion of this code will be the same for this test for all indices, I wonder if we should refactor the common code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored to reduce the duplicate code in the last commit - we now have a test_serialization.py
script that tests each of cagra/ivf_flat/ivf_pq using the same common function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks great!
/merge |
No description provided.