-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pickle protocol 5 with NumPy object arrays #3871
Use pickle protocol 5 with NumPy object arrays #3871
Conversation
@@ -23,7 +23,10 @@ def itemsize(dt): | |||
def serialize_numpy_ndarray(x): | |||
if x.dtype.hasobject: | |||
header = {"pickle": True} | |||
frames = [pickle.dumps(x)] | |||
frames = [None] | |||
buffer_callback = lambda f: frames.append(memoryview(f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we converting into a memoryview
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffer_callback
is called on values of type PickleBuffer
. Here's an example:
In [1]: import pickle
In [2]: import numpy as np
In [3]: a = np.array([np.arange(3), np.arange(4, 6)], dtype=object)
In [4]: l = []; d = pickle.dumps(a, protocol=5, buffer_callback=l.append)
In [5]: l
Out[5]: [<pickle.PickleBuffer at 0x115a64740>, <pickle.PickleBuffer at 0x115a647c0>]
Though PickleBuffer
is a wrapper that expects to be passed some bytes
-like object, which it wraps zero-copy. In the example above these are the nested NumPy arrays.
However as the rest of the serialization code expects frames to be bytes
or memoryview
s. We would either need to retool it to support PickleBuffer
(with special casing when it is not available) or we can simply convert it to a memoryview
, which is also zero-copy.
We've gone for the latter path of converting to a memoryview
similar to what was done in PR ( #3784 ). Note this is done without loss of generality.
1c8e168
to
406dd3b
Compare
a65c5ef
to
e07d26b
Compare
This should be ready on my end. Please let me know if anything else is needed 🙂 |
One more dumb question (sorry for drawing this out): how does this work if the user doesn't have a more modern version of Python? If pickle5 isn't installed and we're using an older version of Python then what happens here? |
Not at all. The discussion is welcome. 🙂 Was more noting this is not in draft status any more. In the case where pickle protocol 5 is not supported, we would skip collecting buffers when calling TL;DR it behaves the same as the code does before this change. |
Ah, ok I was thrown by seeing |
Yeah can see how that would be confusing. Yep, we are importing it here. Of course! Thanks for the review 😄 |
Depends on PR ( #3868 )
Builds off the work in PR ( #3784 ) to support pickle protocol 5. Supports out-of-band pickling when serializing NumPy object arrays. This can be useful when serializing ragged arrays for example or really any NumPy object arrays containing data that supports out-of-band pickling.