-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to register modules to be deeply serialized #417
Conversation
Codecov Report
@@ Coverage Diff @@
## master #417 +/- ##
==========================================
+ Coverage 91.99% 92.36% +0.36%
==========================================
Files 4 4
Lines 687 720 +33
Branches 141 150 +9
==========================================
+ Hits 632 665 +33
Misses 34 34
Partials 21 21
Continue to review full report at Codecov.
|
Thanks for picking this up, @Samreay! |
@ogrisel, any thoughts on the PR? Is there someone I should direct it to? |
This is also of great interest to me! Very nice. Maybe @ogrisel would be the person to review? |
Hello, following up here, is there any plan on merging or moving forward? I see all but 1 test passing |
This sounds like a legit need and the implementation looks fine from a quick look at it. However I am not a big fan of the "deep_serialization" naming in the API and comments. What about calling this "pickle_by_value" or "pickling_by_value" instead? |
@ogrisel Happy to update this to |
Functions renamed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please document the change in the changelog.
@pierreglaser ok with you?
Changes now documented :) |
The failure of |
Yeah, will be interesting to see if they roll back the string typehints given how many libraries its going to impact. Ive seen no memory issues when testing this locally on my macbook, can you point me to which test / build step is concern and I can try to dig a little deeper? |
The macos failure looks random and is probably unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a first batch of comments - mostly nitpicks, but I raised a few more structural suggestions/questions.
As an important side point, I think that as of now, calling cloudpickle.dumps(module)
on a module registered for pickling by value will still be pickled by reference: It is a behavior that we may want to change, cc @ogrisel
Feel free to request my review again when all requested changes are made. I think also that a test making sure modules registered for pickling by value are actually pickled by value when doing a |
@pierreglaser Ive added the module serialisation test in, however it is currently broken, with the main issue being the TypeVar being passed into |
Thanks, I will try to review soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice contrib and tests!
Just a few suggestions to further improve the doc + code nitpicks.
tests/cloudpickle_test.py
Outdated
assert w.run( | ||
lambda: LocalClass().method() == LocalClass().method() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert w.run( | |
lambda: LocalClass().method() == LocalClass().method() | |
) | |
assert ( | |
w.run(lambda: LocalClass().method()) | |
== LocalClass().method() | |
) |
README.md
Outdated
Since `cloudpickle 1.7.0`, it is possible to extend the use of serialization by | ||
value to functions or classes coming from **any pure Python module**. This feature | ||
is useful when the said module is unavailable in the unpickling environment | ||
(making traditional serialization by reference ineffective). To this end, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the motivation (interactive dev on a distributed cluster with long running worker processes) could be made more explicit by reusing the text I suggest in the docstring of register_pickle_by_value
above in this review.
@pierreglaser once my review comments are addressed, feel free to merge and release cloudpickle. But before releasing, please run the downstream ci tests in the release PR to try to avoid introducing any regression in this quite big release. |
@ogrisel thanks for the review! Should we bump |
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
+1 for 2.0.0 :) |
Merged! Thank you very much @kinghuang @Samreay and @pierreglaser! |
Woot 🎉!! Thanks a lot @Samreay for the contribution! |
Thank you everybody, IMHO this is a great improvement towards general purpose clusters. |
Releasing will be done once #432 is addressed - hopefully within the next couple of weeks this should be done. |
Thanks @pierreglaser and @ogrisel for all the commits getting this ready. Fingers crossed #432 is a simple fix! :) |
2.0.0 ===== - Python 3.5 is no longer supported. - Support for registering modules to be serialised by value. This allows code defined in local modules to be serialised and executed remotely without those local modules installed on the remote machine. ([PR #417](cloudpipe/cloudpickle#417)) - Fix a side effect altering dynamic modules at pickling time. ([PR #426](cloudpipe/cloudpickle#426)) - Support for pickling type annotations on Python 3.10 as per [PEP 563]( https://www.python.org/dev/peps/pep-0563/) ([PR #400](cloudpipe/cloudpickle#400)) - Stricter parametrized type detection heuristics in _is_parametrized_type_hint to limit false positives. ([PR #409](cloudpipe/cloudpickle#409)) - Support pickling / depickling of OrderedDict KeysView, ValuesView, and ItemsView, following similar strategy for vanilla Python dictionaries. ([PR #423](cloudpipe/cloudpickle#423)) - Suppressed a source of non-determinism when pickling dynamically defined functions and handles the deprecation of co_lnotab in Python 3.10+. ([PR #428](cloudpipe/cloudpickle#428)) 1.6.0 ===== - `cloudpickle`'s pickle.Pickler subclass (currently defined as `cloudpickle.cloudpickle_fast.CloudPickler`) can and should now be accessed as `cloudpickle.Pickler`. This is the only officially supported way of accessing it. ([issue #366](cloudpipe/cloudpickle#366)) - `cloudpickle` now supports pickling `dict_keys`, `dict_items` and `dict_values`. ([PR #384](cloudpipe/cloudpickle#384)) 1.5.0 ===== - Fix a bug causing cloudpickle to crash when pickling dynamically created, importable modules. ([issue #360](cloudpipe/cloudpickle#354)) - Add optional dependency on `pickle5` to get improved performance on Python 3.6 and 3.7. ([PR #370](cloudpipe/cloudpickle#370)) - Internal refactoring to ease the use of `pickle5` in cloudpickle for Python 3.6 and 3.7. ([PR #368](cloudpipe/cloudpickle#368)) 1.4.1 ===== - Fix incompatibilities between cloudpickle 1.4.0 and Python 3.5.0/1/2 introduced by the new support of cloudpickle for pickling typing constructs. ([issue #360](cloudpipe/cloudpickle#360)) - Restore compat with loading dynamic classes pickled with cloudpickle version 1.2.1 that would reference the `types.ClassType` attribute. ([PR #359](cloudpipe/cloudpickle#359)) 1.4.0 ===== **This version requires Python 3.5 or later** - cloudpickle can now all pickle all constructs from the ``typing`` module and the ``typing_extensions`` library in Python 3.5+ ([PR #318](cloudpipe/cloudpickle#318)) - Stop pickling the annotations of a dynamic class for Python < 3.6 (follow up on #276) ([issue #347](cloudpipe/cloudpickle#347)) - Fix a bug affecting the pickling of dynamic `TypeVar` instances on Python 3.7+, and expand the support for pickling `TypeVar` instances (dynamic or non-dynamic) to Python 3.5-3.6 ([PR #350](cloudpipe/cloudpickle#350)) - Add support for pickling dynamic classes subclassing `typing.Generic` instances on Python 3.7+ ([PR #351](cloudpipe/cloudpickle#351)) 1.3.0 ===== - Fix a bug affecting dynamic modules occuring with modified builtins ([issue #316](cloudpipe/cloudpickle#316)) - Fix a bug affecting cloudpickle when non-modules objects are added into sys.modules ([PR #326](cloudpipe/cloudpickle#326)). - Fix a regression in cloudpickle and python3.8 causing an error when trying to pickle property objects. ([PR #329](cloudpipe/cloudpickle#329)). - Fix a bug when a thread imports a module while cloudpickle iterates over the module list ([PR #322](cloudpipe/cloudpickle#322)). - Add support for out-of-band pickling (Python 3.8 and later). https://docs.python.org/3/library/pickle.html#example ([issue #308](cloudpipe/cloudpickle#308)) - Fix a side effect that would redefine `types.ClassTypes` as `type` when importing cloudpickle. ([issue #337](cloudpipe/cloudpickle#337)) - Fix a bug affecting subclasses of slotted classes. ([issue #311](cloudpipe/cloudpickle#311)) - Dont pickle the abc cache of dynamically defined classes for Python 3.6- (This was already the case for python3.7+) ([issue #302](cloudpipe/cloudpickle#302)) 1.2.2 ===== - Revert the change introduced in ([issue #276](cloudpipe/cloudpickle#276)) attempting to pickle functions annotations for Python 3.4 to 3.6. It is not possible to pickle complex typing constructs for those versions (see [issue #193]( cloudpipe/cloudpickle#193)) - Fix a bug affecting bound classmethod saving on Python 2. ([issue #288](cloudpipe/cloudpickle#288)) - Add support for pickling "getset" descriptors ([issue #290](cloudpipe/cloudpickle#290)) 1.2.1 ===== - Restore (partial) support for Python 3.4 for downstream projects that have LTS versions that would benefit from cloudpickle bug fixes. 1.2.0 ===== - Leverage the C-accelerated Pickler new subclassing API (available in Python 3.8) in cloudpickle. This allows cloudpickle to pickle Python objects up to 30 times faster. ([issue #253](cloudpipe/cloudpickle#253)) - Support pickling of classmethod and staticmethod objects in python2. arguments. ([issue #262](cloudpipe/cloudpickle#262)) - Add support to pickle type annotations for Python 3.5 and 3.6 (pickling type annotations was already supported for Python 3.7, Python 3.4 might also work but is no longer officially supported by cloudpickle) ([issue #276](cloudpipe/cloudpickle#276)) - Internal refactoring to proactively detect dynamic functions and classes when pickling them. This refactoring also yields small performance improvements when pickling dynamic classes (~10%) ([issue #273](cloudpipe/cloudpickle#273)) 1.1.1 ===== - Minor release to fix a packaging issue (Markdown formatting of the long description rendered on pypi.org). The code itself is the same as 1.1.0. 1.1.0 ===== - Support the pickling of interactively-defined functions with positional-only arguments. ([issue #266](cloudpipe/cloudpickle#266)) - Track the provenance of dynamic classes and enums so as to preseve the usual `isinstance` relationship between pickled objects and their original class defintions. ([issue #246](cloudpipe/cloudpickle#246)) 1.0.0 ===== - Fix a bug making functions with keyword-only arguments forget the default values of these arguments after being pickled. ([issue #264](cloudpipe/cloudpickle#264)) 0.8.1 ===== - Fix a bug (already present before 0.5.3 and re-introduced in 0.8.0) affecting relative import instructions inside depickled functions ([issue #254](cloudpipe/cloudpickle#254)) 0.8.0 ===== - Add support for pickling interactively defined dataclasses. ([issue #245](cloudpipe/cloudpickle#245)) - Global variables referenced by functions pickled by cloudpickle are now unpickled in a new and isolated namespace scoped by the CloudPickler instance. This restores the (previously untested) behavior of cloudpickle prior to changes done in 0.5.4 for functions defined in the `__main__` module, and 0.6.0/1 for other dynamic functions. 0.7.0 ===== - Correctly serialize dynamically defined classes that have a `__slots__` attribute. ([issue #225](cloudpipe/cloudpickle#225)) 0.6.1 ===== - Fix regression in 0.6.0 which breaks the pickling of local function defined in a module, making it impossible to access builtins. ([issue #211](cloudpipe/cloudpickle#211)) 0.6.0 ===== - Ensure that unpickling a function defined in a dynamic module several times sequentially does not reset the values of global variables. ([issue #187](cloudpipe/cloudpickle#205)) - Restrict the ability to pickle annotations to python3.7+ ([issue #193]( cloudpipe/cloudpickle#193) and [issue #196]( cloudpipe/cloudpickle#196)) - Stop using the deprecated `imp` module under Python 3. ([issue #207](cloudpipe/cloudpickle#207)) - Fixed pickling issue with singleton types `NoneType`, `type(...)` and `type(NotImplemented)` ([issue #209](cloudpipe/cloudpickle#209)) 0.5.6 ===== - Ensure that unpickling a locally defined function that accesses the global variables of a module does not reset the values of the global variables if they are already initialized. ([issue #187](cloudpipe/cloudpickle#187)) 0.5.5 ===== - Fixed inconsistent version in `cloudpickle.__version__`. 0.5.4 ===== - Fixed a pickling issue for ABC in python3.7+ ([issue #180]( cloudpipe/cloudpickle#180)). - Fixed a bug when pickling functions in `__main__` that access global variables ([issue #187]( cloudpipe/cloudpickle#187)). 0.5.3 ===== - Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types ([issue #144](cloudpipe/cloudpickle#144)). - itertools objects can also pickled ([PR #156](cloudpipe/cloudpickle#156)). - `logging.RootLogger` can be also pickled ([PR #160](cloudpipe/cloudpickle#160)). 0.5.2 ===== - Fixed a regression: `AttributeError` when loading pickles that hold a reference to a dynamically defined class from the `__main__` module. ([issue #131]( cloudpipe/cloudpickle#131)). - Make it possible to pickle classes and functions defined in faulty modules that raise an exception when trying to look-up their attributes by name. 0.5.1 ===== - Fixed `cloudpickle.__version__`. 0.5.0 ===== - Use `pickle.HIGHEST_PROTOCOL` by default. 0.4.4 ===== - `logging.RootLogger` can be also pickled ([PR #160](cloudpipe/cloudpickle#160)). 0.4.3 ===== - Fixed a regression: `AttributeError` when loading pickles that hold a reference to a dynamically defined class from the `__main__` module. ([issue #131]( cloudpipe/cloudpickle#131)). - Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types. ([issue #144](cloudpipe/cloudpickle#144)) 0.4.2 ===== - Restored compatibility with pickles from 0.4.0. - Handle the `func.__qualname__` attribute. 0.4.1 ===== - Fixed a crash when pickling dynamic classes whose `__dict__` attribute was defined as a [`property`](https://docs.python.org/3/library/functions.html#property). Most notably, this affected dynamic [namedtuples](https://docs.python.org/2/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields) in Python 2. (cloudpipe/cloudpickle#113) - Cloudpickle now preserves the `__module__` attribute of functions (cloudpipe/cloudpickle#118). - Fixed a crash when pickling modules that don't have a `__package__` attribute (cloudpipe/cloudpickle#116). 0.4.0 ===== * Fix functions with empty cells * Allow pickling Logger objects * Fix crash when pickling dynamic class cycles * Ignore "None" mdoules added to sys.modules * Support WeakSets and ABCMeta instances * Remove non-standard `__transient__` support * Catch exception from `pickle.whichmodule()` 0.3.1 ===== * Fix version information and ship a changelog 0.3.0 ===== * Import submodules accessed by pickled functions * Support recursive functions inside closures * Fix `ResourceWarnings` and `DeprecationWarnings` * Assume modules with `__file__` attribute are not dynamic 0.2.2 ===== * Support Python 3.6 * Support Tornado Coroutines * Support builtin methods
This PR is based on the work done by @kinghuang in PR391, but takes on the feedback provided by @ogrisel and adds testing.
Fixes #206
Issue Summary
To summarise the issue, in many cases
cloudpickle
is used to send code for remote execution. This is used indask
,prefect
,mlflow
and many libraries. For local functions, this works perfectly fine. But for any non-local function or class,cloudpickle
assumes that external modules and packages are available at the location of deserialization. This may either not be the case, or the version of the package available at the end point may be different.This PR adds the option to register modules for deep serialization by providing a
register_deep_serialization
function which takes either a name or a module. This is the originalregister_dynamic_module
by @kinghuang.Original
dumps
:dumps
after registeringtests.external
for deep serialization:Modules can be unregistered via
unregister_deep_serialization
Tests
One of the example tests with an explicit use case is shown above. On top of this, tests have been added to
_lookup_module_and_qualname
using the_cloudpickle_testpkg
package, and also to the new_is_explicitly_serialized_module
function.