Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to register modules to be deeply serialized #417

Merged
merged 74 commits into from
Aug 1, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
8a1578b
TST trigger the bug of #425 in a test
pierreglaser Jun 15, 2021
1de1c4f
FIX remove the __builtins__ from a copy of the module
pierreglaser Jun 15, 2021
05ebfd3
MNT document change inside CHANGES.md
pierreglaser Jun 15, 2021
38364d6
Fixes #206.
Apr 20, 2021
3046111
Adding newline to end of file
Apr 20, 2021
05b0f83
Further flake8 changes
Apr 20, 2021
de1b375
More formatting fixes
Apr 20, 2021
261f127
Renaming function
Jun 14, 2021
9dcb540
Document changes in changelog
Jun 15, 2021
a2ec3a2
Unifying the naming convention
Jun 15, 2021
6e882e8
Updating naming conventions
Jun 15, 2021
cd1fde3
Adding broken module test
Jun 15, 2021
50d56a9
Resolving protocol
Jun 15, 2021
898cf05
Adding instnace check for TypeVar
Jun 15, 2021
65df95b
Merge branch 'master' into 206-deep-serialization
pierreglaser Jun 19, 2021
fd9d250
CLN fix linting errors
pierreglaser Jun 19, 2021
5f84156
fixup! CLN fix linting errors
pierreglaser Jun 19, 2021
14067eb
Making separate typevar chec, and using functions for the `_should_pi…
Jun 21, 2021
cce4708
cosmit
ogrisel Jun 22, 2021
14a5cfe
Update cloudpickle/cloudpickle.py
Samreay Jun 22, 2021
ab57df8
Update tests/cloudpickle_test.py
Samreay Jun 22, 2021
cfcf843
Updating variable names
Jun 22, 2021
cc04c2b
Adding another unit test to check local module patching
Jun 30, 2021
c689adc
Updating names and comments
Jun 30, 2021
bad9f75
I am a savage that didnt run flake8 and I apologise
Jun 30, 2021
23cbfb3
Merge branch 'master' into 206-deep-serialization
ogrisel Jun 30, 2021
261b5a8
Merge branch 'master' into 206-deep-serialization
ogrisel Jun 30, 2021
b29547e
Updating function and variable names
Jul 1, 2021
bd4967f
highlight possible bugs silenced by pytest
pierreglaser Jul 4, 2021
d83e25a
DOC Present the feature in the README
pierreglaser Jul 11, 2021
a7725bd
Removing submodules from arg as it is not directly used
Samreay Jul 11, 2021
c799711
TST rework the test structure
pierreglaser Jul 12, 2021
11eb075
Merge branch 'master' into 206-deep-serialization
pierreglaser Jul 12, 2021
b111a56
TST test by simulating an interactive session
pierreglaser Jul 13, 2021
4a3b3dd
CI debug sys.path issues in CI
pierreglaser Jul 13, 2021
ad85452
CI again
pierreglaser Jul 13, 2021
4f99ad1
CI again
pierreglaser Jul 13, 2021
59a0903
CI again
pierreglaser Jul 13, 2021
0bf7c58
CI again
pierreglaser Jul 13, 2021
43210dc
CI again
pierreglaser Jul 13, 2021
2fdd912
CI again
pierreglaser Jul 13, 2021
03c82cb
CI again
pierreglaser Jul 13, 2021
730c11b
CI again
pierreglaser Jul 13, 2021
1d97bff
CI again
pierreglaser Jul 13, 2021
aa12ac2
CI look only at macos ci builds
pierreglaser Jul 13, 2021
93544c5
CI again
pierreglaser Jul 13, 2021
2d3d1ed
TST: fix isolation procedure on linux
pierreglaser Jul 14, 2021
fd740f2
CI restore testing for all OSes
pierreglaser Jul 14, 2021
1794c2e
TST take into account possisble PYTHONPATH values
pierreglaser Jul 14, 2021
a6637d5
API replace is_registered_... by list_registry
pierreglaser Jul 15, 2021
866f096
CLN remove unused import
pierreglaser Jul 15, 2021
1206827
TST add tests invoving namespace modules and subfolders
pierreglaser Jul 21, 2021
4ee37d9
TST test pickling by value installed packages
pierreglaser Jul 21, 2021
4faafbc
TST add module inside locally importable subfolder
pierreglaser Jul 21, 2021
dae05bb
TST add funcs with globals in _cloudpickle_testpkg
pierreglaser Jul 21, 2021
3d4c896
TST remote co-existence of multiple versions of a func
pierreglaser Jul 21, 2021
4f5942e
TST, FIX some crucial line dissapearing
pierreglaser Jul 21, 2021
c50aa4c
API enforce module-type input for registration api
pierreglaser Jul 22, 2021
5843205
TST (try to) escape backslashes on windows
pierreglaser Jul 22, 2021
5a2b25b
TST (try to) escape backslashes on windows (2)
pierreglaser Jul 22, 2021
2b82f4d
CLN update README after API change
pierreglaser Jul 22, 2021
058c537
TST fix typo in test
pierreglaser Jul 22, 2021
850be6c
CLN remove stale file
pierreglaser Jul 22, 2021
749a6b7
CLN clean up some un-necessary branches
pierreglaser Jul 22, 2021
c5bc41e
postpone relative imports handling to the future
pierreglaser Jul 22, 2021
6ef5ec8
_is_registered_pickle_by_value should take a module
pierreglaser Jul 31, 2021
8004bae
CLN cleaner registration API error messages
pierreglaser Jul 31, 2021
418b848
CLN unused import
pierreglaser Jul 31, 2021
79a38fe
Apply docs suggestions
pierreglaser Jul 31, 2021
4f9af38
Update tests/cloudpickle_test.py
pierreglaser Jul 31, 2021
5f079b0
fix linting errors
pierreglaser Jul 31, 2021
0589dcb
more linting...
pierreglaser Jul 31, 2021
e4ca37d
TST fix a few test mistakes
pierreglaser Jul 31, 2021
1e9a48d
DOC more in-depth context description in README
pierreglaser Jul 31, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,13 @@ dev

- Python 3.5 is no longer supported.

- Support for registering modules to be serialised by value. This will
allow for code defined in local modules to be serialised and executed
remotely without those local modules installed on the remote machine.
([PR #417](https://github.com/cloudpipe/cloudpickle/pull/417))

- Fix a side effect altering dynamic modules at pickling time.
([PR #426](https://github.com/cloudpipe/cloudpickle/pull/426))

- Support for pickling type annotations on Python 3.10 as per [PEP 563](
https://www.python.org/dev/peps/pep-0563/)
([PR #400](https://github.com/cloudpipe/cloudpickle/pull/400))
Expand Down
53 changes: 53 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,59 @@ Pickling a function interactively defined in a Python shell session
85
```


Overriding pickle's serialization mechanism for importable constructs:
----------------------------------------------------------------------

An important difference between `cloudpickle` and `pickle` is that
`cloudpickle` can serialize a function or class **by value**, whereas `pickle`
can only serialize it **by reference**. Serialization by reference treats
functions and classes as attributes of modules, and pickles them through
instructions that trigger the import of their module at load time.
Serialization by reference is thus limited in that it assumes that the module
containing the function or class is available/importable in the unpickling
environment. This assumption breaks when pickling constructs defined in an
interactive session, a case that is automatically detected by `cloudpickle`,
that pickles such constructs **by value**.

Another case where the importability assumption is expected to break is when
developing a module in a distributed execution environment: the worker
processes may not have access to the said module, for example if they live on a
different machine than the process in which the module is being developed.
By itself, `cloudpickle` cannot detect such "locally importable" modules and
switch to serialization by value; instead, it relies on its default mode,
which is serialization by reference. However, since `cloudpickle 1.7.0`, one
can explicitly specify modules for which serialization by value should be used,
using the `register_pickle_by_value(module)`/`/unregister_pickle(module)` API:

```python
>>> import cloudpickle
>>> import my_module
>>> cloudpickle.register_pickle_by_value(my_module)
>>> cloudpickle.dumps(my_module.my_function) # my_function is pickled by value
>>> cloudpickle.unregister_pickle_by_value(my_module)
>>> cloudpickle.dumps(my_module.my_function) # my_function is pickled by reference
```

Using this API, there is no need to re-install the new version of the module on
all the worker nodes nor to restart the workers: restarting the client Python
process with the new source code is enough.

Note that this feature is still **experimental**, and may fail in the following
situations:

- If the body of a function/class pickled by value contains an `import` statement:
```python
>>> def f():
>>> ... from another_module import g
>>> ... # calling f in the unpickling environment may fail if another_module
>>> ... # is unavailable
>>> ... return g() + 1
```

- If a function pickled by reference uses a function pickled by value during its execution.


Running the tests
-----------------

Expand Down
116 changes: 106 additions & 10 deletions cloudpickle/cloudpickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,9 @@ def g():
# communication speed over compatibility:
DEFAULT_PROTOCOL = pickle.HIGHEST_PROTOCOL

# Names of modules whose resources should be treated as dynamic.
_PICKLE_BY_VALUE_MODULES = set()

# Track the provenance of reconstructed dynamic classes to make it possible to
# reconstruct instances from the matching singleton class definition when
# appropriate and preserve the usual "isinstance" semantics of Python objects.
Expand Down Expand Up @@ -124,6 +127,77 @@ def _lookup_class_or_track(class_tracker_id, class_def):
return class_def


def register_pickle_by_value(module):
"""Register a module to make it functions and classes picklable by value.

By default, functions and classes that are attributes of an importable
module are to be pickled by reference, that is relying on re-importing
the attribute from the module at load time.

If `register_pickle_by_value(module)` is called, all its functions and
classes are subsequently to be pickled by value, meaning that they can
be loaded in Python processes where the module is not importable.

This is especially useful when developing a module in a distributed
execution environment: restarting the client Python process with the new
source code is enough: there is no need to re-install the new version
of the module on all the worker nodes nor to restart the workers.

Note: this feature is considered experimental. See the cloudpickle
README.md file for more details and limitations.
"""
if not isinstance(module, types.ModuleType):
raise ValueError(
f"Input should be a module object, got {str(module)} instead"
)
# In the future, cloudpickle may need a way to access any module registered
# for pickling by value in order to introspect relative imports inside
# functions pickled by value. (see
# https://github.com/cloudpipe/cloudpickle/pull/417#issuecomment-873684633).
# This access can be ensured by checking that module is present in
# sys.modules at registering time and assuming that it will still be in
# there when accessed during pickling. Another alternative would be to
# store a weakref to the module. Even though cloudpickle does not implement
# this introspection yet, in order to avoid a possible breaking change
# later, we still enforce the presence of module inside sys.modules.
if module.__name__ not in sys.modules:
raise ValueError(
f"{module} was not imported correctly, have you used an "
f"`import` statement to access it?"
)
_PICKLE_BY_VALUE_MODULES.add(module.__name__)


def unregister_pickle_by_value(module):
"""Unregister that the input module should be pickled by value."""
if not isinstance(module, types.ModuleType):
raise ValueError(
f"Input should be a module object, got {str(module)} instead"
)
if module.__name__ not in _PICKLE_BY_VALUE_MODULES:
raise ValueError(f"{module} is not registered for pickle by value")
else:
_PICKLE_BY_VALUE_MODULES.remove(module.__name__)


def list_registry_pickle_by_value():
return _PICKLE_BY_VALUE_MODULES.copy()


def _is_registered_pickle_by_value(module):
module_name = module.__name__
if module_name in _PICKLE_BY_VALUE_MODULES:
return True
while True:
parent_name = module_name.rsplit(".", 1)[0]
if parent_name == module_name:
break
if parent_name in _PICKLE_BY_VALUE_MODULES:
return True
module_name = parent_name
return False


def _whichmodule(obj, name):
"""Find the module an object belongs to.

Expand Down Expand Up @@ -170,18 +244,35 @@ def _whichmodule(obj, name):
return None


def _is_importable(obj, name=None):
"""Dispatcher utility to test the importability of various constructs."""
if isinstance(obj, types.FunctionType):
return _lookup_module_and_qualname(obj, name=name) is not None
elif issubclass(type(obj), type):
return _lookup_module_and_qualname(obj, name=name) is not None
def _should_pickle_by_reference(obj, name=None):
"""Test whether an function or a class should be pickled by reference

Pickling by reference means by that the object (typically a function or a
class) is an attribute of a module that is assumed to be importable in the
target Python environment. Loading will therefore rely on importing the
module and then calling `getattr` on it to access the function or class.

Pickling by reference is the only option to pickle functions and classes
in the standard library. In cloudpickle the alternative option is to
pickle by value (for instance for interactively or locally defined
functions and classes or for attributes of modules that have been
explicitly registered to be pickled by value.
"""
if isinstance(obj, types.FunctionType) or issubclass(type(obj), type):
module_and_name = _lookup_module_and_qualname(obj, name=name)
if module_and_name is None:
return False
module, name = module_and_name
return not _is_registered_pickle_by_value(module)

elif isinstance(obj, types.ModuleType):
# We assume that sys.modules is primarily used as a cache mechanism for
# the Python import machinery. Checking if a module has been added in
# is sys.modules therefore a cheap and simple heuristic to tell us whether
# we can assume that a given module could be imported by name in
# another Python process.
# is sys.modules therefore a cheap and simple heuristic to tell us
# whether we can assume that a given module could be imported by name
# in another Python process.
if _is_registered_pickle_by_value(obj):
return False
return obj.__name__ in sys.modules
else:
raise TypeError(
Expand Down Expand Up @@ -839,10 +930,15 @@ def _decompose_typevar(obj):


def _typevar_reduce(obj):
# TypeVar instances have no __qualname__ hence we pass the name explicitly.
# TypeVar instances require the module information hence why we
# are not using the _should_pickle_by_reference directly
module_and_name = _lookup_module_and_qualname(obj, name=obj.__name__)

if module_and_name is None:
return (_make_typevar, _decompose_typevar(obj))
elif _is_registered_pickle_by_value(module_and_name[0]):
return (_make_typevar, _decompose_typevar(obj))

return (getattr, module_and_name)


Expand Down
12 changes: 6 additions & 6 deletions cloudpickle/cloudpickle_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
from .compat import pickle, Pickler
from .cloudpickle import (
_extract_code_globals, _BUILTIN_TYPE_NAMES, DEFAULT_PROTOCOL,
_find_imported_submodules, _get_cell_contents, _is_importable,
_find_imported_submodules, _get_cell_contents, _should_pickle_by_reference,
_builtin_type, _get_or_create_tracker_id, _make_skeleton_class,
_make_skeleton_enum, _extract_class_dict, dynamic_subimport, subimport,
_typevar_reduce, _get_bases, _make_cell, _make_empty_cell, CellType,
Expand Down Expand Up @@ -352,7 +352,7 @@ def _memoryview_reduce(obj):


def _module_reduce(obj):
if _is_importable(obj):
if _should_pickle_by_reference(obj):
return subimport, (obj.__name__,)
else:
# Some external libraries can populate the "__builtins__" entry of a
Expand Down Expand Up @@ -414,7 +414,7 @@ def _class_reduce(obj):
return type, (NotImplemented,)
elif obj in _BUILTIN_TYPE_NAMES:
return _builtin_type, (_BUILTIN_TYPE_NAMES[obj],)
elif not _is_importable(obj):
elif not _should_pickle_by_reference(obj):
return _dynamic_class_reduce(obj)
return NotImplemented

Expand Down Expand Up @@ -559,7 +559,7 @@ def _function_reduce(self, obj):
As opposed to cloudpickle.py, There no special handling for builtin
pypy functions because cloudpickle_fast is CPython-specific.
"""
if _is_importable(obj):
if _should_pickle_by_reference(obj):
return NotImplemented
else:
return self._dynamic_function_reduce(obj)
Expand Down Expand Up @@ -763,7 +763,7 @@ def save_global(self, obj, name=None, pack=struct.pack):
)
elif name is not None:
Pickler.save_global(self, obj, name=name)
elif not _is_importable(obj, name=name):
elif not _should_pickle_by_reference(obj, name=name):
self._save_reduce_pickle5(*_dynamic_class_reduce(obj), obj=obj)
else:
Pickler.save_global(self, obj, name=name)
Expand All @@ -775,7 +775,7 @@ def save_function(self, obj, name=None):
Determines what kind of function obj is (e.g. lambda, defined at
interactive prompt, etc) and handles the pickling appropriately.
"""
if _is_importable(obj, name=name):
if _should_pickle_by_reference(obj, name=name):
return Pickler.save_global(self, obj, name=name)
elif PYPY and isinstance(obj.__code__, builtin_code_type):
return self.save_pypy_builtin_func(obj)
Expand Down
Loading