-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pickle5 support #2495
Comments
No work is currently underway. I have no particular objection to pickle5
support. In principle it sounds like a nice idea.
…On Wed, Jan 30, 2019 at 8:55 PM tjb900 ***@***.***> wrote:
Hi! Not really an issue, just a query with the aim of avoiding duplicated
work. Our use case for distributed involves keys whose values are often
lists of or dicts of (lists of, or dicts of, etc) numpy arrays, and
unfortunately the current serialization scheme - while fantastic for numpy
arrays not embedded in other objects - does not handle these nested
structures particularly well.
Now that numpy 1.16 is out, the pickle5 protocol, and its backport (
https://github.com/pitrou/pickle5-backport) would seem to provide a very
elegant solution to efficiently communicating these kinds of structures by
passing the large data arrays as out-of-band data that doesn't need to be
embedded into the pickle bytestream.
Is work already underway somewhere on some remote branch to integrate this
into distributed? If not, would such a PR be welcome? or should we wait for
more experienced hands to tackle it?
Thanks in advance!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2495>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszNExaq2kE_wf9V5pJJiV2cWUiaIGks5vInc0gaJpZM4ablKO>
.
|
Fine to keep it open
…On Thu, Jan 31, 2019 at 5:18 PM tjb900 ***@***.***> wrote:
Closed #2495 <#2495>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2495 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszK0R6B8Tak5eJ4OHluUip89SNaUGks5vI5XLgaJpZM4ablKO>
.
|
Reopening as I'd like us to track this as part of extending the work already done in PR ( #3784 ) to other Python versions. |
Before adding this to Dask we would need pickle5 support in cloudpickle. A fair bit of work has been done in PR ( cloudpipe/cloudpickle#370 ) to handle that. Much thanks to @pierreglaser for making that possible! |
@jakirkham is this now resolved with #3784? |
No this is something different. It would require PR ( #3849 ) and PR ( cloudpipe/cloudpickle#370 ) (plus a release of cloudpickle) to resolve. |
Alright I think PR ( #3849 ) should be ready for review. Please let me know what you think 🙂 |
Hi! Not really an issue, just a query with the aim of avoiding duplicated work. Our use case for distributed involves keys whose values are often lists of or dicts of (lists of, or dicts of, etc) numpy arrays, and unfortunately the current serialization scheme - while fantastic for numpy arrays not embedded in other objects - does not handle these nested structures particularly well.
Now that numpy 1.16 is out, the pickle5 protocol, and its backport (https://github.com/pitrou/pickle5-backport) would seem to provide a very elegant solution to efficiently communicating these kinds of structures by passing the large data arrays as out-of-band data that doesn't need to be embedded into the pickle bytestream.
Is work already underway somewhere on some remote branch to integrate this into distributed? If not, would such a PR be welcome? or should we wait for more experienced hands to tackle it?
Thanks in advance!
The text was updated successfully, but these errors were encountered: