Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickle5 support #2495

Closed
tjb900 opened this issue Jan 31, 2019 · 7 comments · Fixed by #3849
Closed

pickle5 support #2495

tjb900 opened this issue Jan 31, 2019 · 7 comments · Fixed by #3849

Comments

@tjb900
Copy link
Contributor

tjb900 commented Jan 31, 2019

Hi! Not really an issue, just a query with the aim of avoiding duplicated work. Our use case for distributed involves keys whose values are often lists of or dicts of (lists of, or dicts of, etc) numpy arrays, and unfortunately the current serialization scheme - while fantastic for numpy arrays not embedded in other objects - does not handle these nested structures particularly well.

Now that numpy 1.16 is out, the pickle5 protocol, and its backport (https://github.com/pitrou/pickle5-backport) would seem to provide a very elegant solution to efficiently communicating these kinds of structures by passing the large data arrays as out-of-band data that doesn't need to be embedded into the pickle bytestream.

Is work already underway somewhere on some remote branch to integrate this into distributed? If not, would such a PR be welcome? or should we wait for more experienced hands to tackle it?

Thanks in advance!

@mrocklin
Copy link
Member

mrocklin commented Jan 31, 2019 via email

@tjb900 tjb900 closed this as completed Feb 1, 2019
@mrocklin
Copy link
Member

mrocklin commented Feb 1, 2019 via email

@jakirkham
Copy link
Member

Reopening as I'd like us to track this as part of extending the work already done in PR ( #3784 ) to other Python versions.

@jakirkham jakirkham reopened this May 29, 2020
@jakirkham
Copy link
Member

Before adding this to Dask we would need pickle5 support in cloudpickle. A fair bit of work has been done in PR ( cloudpipe/cloudpickle#370 ) to handle that. Much thanks to @pierreglaser for making that possible!

@jacobtomlinson
Copy link
Member

@jakirkham is this now resolved with #3784?

@jakirkham
Copy link
Member

No this is something different. It would require PR ( #3849 ) and PR ( cloudpipe/cloudpickle#370 ) (plus a release of cloudpickle) to resolve.

@jakirkham
Copy link
Member

Alright I think PR ( #3849 ) should be ready for review. Please let me know what you think 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants