-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NumPy arrays serialize more slowly with cloudpickle than pickle #58
Comments
This is in Python 3. Anecdotally I find the following behavior:
I'm actually a bit confused about why this isn't significantly faster. Presumably this is pickling the metadata (which is quite fast) and then copying over the data bytestring, which should happen at near memcpy speeds I think? |
Yes, Python 3 Unfortunately it's not possible to customize the compiled versions of the picklers. Note that for the latest versions of the pickle protocol, it's possible to pass very large raw byte streams very efficiently but it should probably be fixed in the C code of numpy (I have not checked in details). |
Cross reference to relevant conversation on python-dev:
|
Also cross reference #44. I keep hitting speed issues with dask due to cloudpickling large-ish objects. These objects are fairly trivial for the C pickler but take minutes for CloudPickle, and it's making an otherwise interactive analysis a bit of a pain. Two key things I want to pickle:
I did my own digging. It that the fundamental problem is that the C pickler's dispatch table is only consulted after the fundamental types (including function, list, tuple, etc) have already been written. This means you can't prevent override the behaviour of C pickle when it's writing a list, for example. The "obvious" fix would be to move this lookup to before the fundamental types, assuming that it is not too expensive to do this lookup all the time. This would then allow inserting things into the C pickler's dispatch table such as function. @pitrou have you considered such a thing? I assume that lookup is likely relatively cheap. Clearly such a move would be a trade-off with a (possibly small?) negative impact on other types of pickling operations. But maybe it's worth it to enable much more efficient pickling of other types of things. I also threw together a hack which enables cloudpickle to efficiently pickle large sets. For my 1M element set, it gives a 30x speedup over the pure python pickler. I assume works with numpy arrays too. The basic reason hacks of this style can't be used in general is that once you go into the C side, you then can't come back to the python side. That's fine to do if you know your containers don't have any objects in them which will need cloudpickle. |
I think |
Maybe we should split the issue into 2 issues: one about the speed of compound objects with many subobjects (typically list, set or dict of strings) and keep this issue for the originally reported single large numpy array pickling perf which I suspect is caused by some unnecessary memory copy when using the pure-python pickler (either at pickling time or at unpicking time). |
Apologies, I had intended to make the above comments on #68. I do think there is some overlap though, a solution to the problem of using the cpickler where possible would potentially solve both issues. |
Pickling of "fundamental" built-in types such as sets, tuples, etc. (but probably not functions) is performance-critical. Does cloudpickle only need to override the pickling of functions, or are other built-in types also special-cased? |
Commenting on my own quoted message in #58 (comment), I made a confusion between |
Ok, so here is what I think the C Pickler needs for cloudpickle to make use of it:
|
Note to self:
|
For reference, improvements to the C pickler to make it possible to support efficient local class and function definitions (and possibly other cloudpickle/dill features) directly in the standard library are being discussed in this python-dev mailing list thread: https://mail.python.org/pipermail/python-dev/2018-March/152509.html |
Should we close this now that #253 was merged and that the |
Went ahead and re-ran the benchmark on Python 3 (didn't try Python 2). Looks like there is a significant improvement. In [1]: import numpy as np
In [2]: import cloudpickle, pickle
In [3]: data = np.random.randint(0, 255, dtype='u1', size=100000000)
In [4]: %timeit len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
80.1 ms ± 722 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [5]: %timeit len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
46.5 ms ± 483 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) |
Which Python 3, if I may? |
Sorry Python 3.7.3. |
cloudpickle subclasses the C implementation of the |
But actually what is important here is not C vs Python implementation of the pickler but the fact that cloudpickle and pickle use pickle protocol 5 efficiently which is already there Python 3.7. |
Indeed the improvement with pickle protocol 5 is noticeable. Very nice work here already indeed! It would be good to support this on all Python versions we support. I think PR ( #370 ) should get us there. It's maybe 90% towards being completely green. Any help on getting that last 10% would be appreciated (kind of stuck atm). |
I would expect pickle and cloudpickle to behave pretty much identically here. Sadly cloudpickle serializes much more slowly.
The text was updated successfully, but these errors were encountered: