Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: derive from C-pickler for fast serialization #253

Merged
merged 70 commits into from
Jun 7, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
8e75654
MNT use the is_dynamic for both classes and functions
pierreglaser May 20, 2019
51df1aa
FIX fix is_dynamic for some builtin packages in pypy
pierreglaser May 22, 2019
b89b259
FIX python2-3 compat
pierreglaser May 22, 2019
161924b
ENH extend _pickle.Pickler
pierreglaser Feb 22, 2019
7da9aaf
MNT load cloudpickle_fast for recent (>3.8) python
pierreglaser Mar 6, 2019
d44fd1e
MNT remove test_namedtuple skip after cpython changes
pierreglaser Mar 6, 2019
daee859
MNT add cloudpickle's code_globals_cache
pierreglaser Mar 6, 2019
c44f75b
CLN comment cosmetics
pierreglaser Mar 6, 2019
7aae29b
MNT remove python2-compat lines from file_reduce
pierreglaser Mar 6, 2019
ec461ab
CLN various style/cosmetics
pierreglaser Mar 6, 2019
e75f466
MNT re-use builtin-type constructor cache
pierreglaser Mar 6, 2019
7e311ae
DOC explain why warnings are filtered in tests
pierreglaser Mar 6, 2019
009e4a2
TST silent deprecation warning in some tests
pierreglaser Mar 6, 2019
c28d156
CI test cloudpickle against python3.8 with hooks
pierreglaser Mar 7, 2019
dfea4f5
CLN de-duplicate utility functions
pierreglaser Mar 27, 2019
3df299f
CLN de-duplicate complex utilities functions
pierreglaser Mar 27, 2019
78dd8c7
TST fix test for cloudpickle <= 3.7
pierreglaser Mar 27, 2019
c364505
DOC more explicit save_global fallback comment
pierreglaser Mar 27, 2019
de67530
CLN make reducers private
pierreglaser Mar 27, 2019
eb77646
MNT backport 0.8.1 patch into cludpickle_fast
pierreglaser Mar 28, 2019
1dea4ab
CLN unused imports
pierreglaser Mar 28, 2019
10afbe8
CLN naming (subimports -> submodules)
pierreglaser Apr 11, 2019
df3b5a2
CLN handle the file in a context manager
pierreglaser Apr 11, 2019
e3344b0
CLN hide slotstate (possible implementation detail)
pierreglaser Apr 11, 2019
749e88b
CLN make extract_code_globals private
pierreglaser Apr 11, 2019
09c38cd
CLN cleanup stale comments
pierreglaser Apr 16, 2019
3151fc4
CLN is_metaclass -> is_anyclass
pierreglaser Apr 16, 2019
cac49e4
CLN docstrings conventions
pierreglaser Apr 16, 2019
c9402e9
CLN explain cloudpickle global_hook use_case
pierreglaser Apr 16, 2019
841f33f
MNT better compat with early python3.8 versions
pierreglaser Apr 16, 2019
ea21d06
CLN stale comments
pierreglaser Apr 17, 2019
0145a1c
MNT update to comply cpython PR changes
pierreglaser Apr 19, 2019
0aba017
MNT use the new pickler subclassing API
pierreglaser Apr 26, 2019
707ec29
MNT update to recent changes in master
pierreglaser May 22, 2019
b92ac58
CLN cleanups
pierreglaser May 22, 2019
920949e
[ci python-nightly] fix coverage failure
pierreglaser May 23, 2019
54e7341
[ci python-nightly] fix coverage failure (2)
pierreglaser May 23, 2019
2357fa4
[ci python-nightly] fix coverage failure (3)
pierreglaser May 23, 2019
d753cdf
[ci python-nightly] fix coverage failure (4)
pierreglaser May 23, 2019
d79c3a9
CI re-enable windows builds
pierreglaser May 23, 2019
8db031a
CLN duplicated code
pierreglaser May 23, 2019
6e70c2d
MNT rebasing mistakes
pierreglaser Jun 5, 2019
e439f7a
CLN make some reducers CloudPickler methods
pierreglaser Jun 5, 2019
0bdb4bc
CI test python3.8-dev version
pierreglaser Jun 5, 2019
ac1d05c
CI test against python nighlty on every commit
pierreglaser Jun 5, 2019
968f769
MNT DOC explain WeakKeyDictionary guard in PyPy
pierreglaser Jun 6, 2019
db8b2c5
FIX pre-populate dispatch with copyreg_dispatch_table
pierreglaser Jun 6, 2019
c276d83
FIX more robust alternative + comments
pierreglaser Jun 6, 2019
c50339b
MAINT add numpy master to python nightly ci
ogrisel Jun 6, 2019
4c3c05f
Fix equality test?
ogrisel Jun 6, 2019
a712557
Remove comment that breaks yaml parsing
ogrisel Jun 6, 2019
5f6defe
CLN rebase with #278
pierreglaser Jun 6, 2019
a3bb79a
Update cloudpickle/cloudpickle_fast.py
pierreglaser Jun 7, 2019
e90d45d
DOC better reducer_override comment
pierreglaser Jun 7, 2019
3b87904
DOC, CLN clearer comments and names
pierreglaser Jun 7, 2019
042e8e4
DOC better find_imported_submodules docstring
pierreglaser Jun 7, 2019
71f3c09
TST test reference cycle error
pierreglaser Jun 7, 2019
8219e37
CLN cleaner __init__
pierreglaser Jun 7, 2019
b4a8bdf
MNT drop support for 3.8 alpha releases
pierreglaser Jun 7, 2019
3db1d3d
TST cross-version recursion test
pierreglaser Jun 7, 2019
8be8e72
FIX fix spurious debugging attempts
pierreglaser Jun 7, 2019
3290de2
CLN drop old save_subimport method in cloudpickle
pierreglaser Jun 7, 2019
1419073
MNT changelog
pierreglaser Jun 7, 2019
e9425ce
CLN remove PyPy-specific code in cloudpickle_fast
pierreglaser Jun 7, 2019
80246a9
CLN fix flake8 complains
pierreglaser Jun 7, 2019
4275d10
CLN clearer doc
pierreglaser Jun 7, 2019
853bd31
CLN cosmetics
pierreglaser Jun 7, 2019
021a6b1
CLN cosmetics (2)
pierreglaser Jun 7, 2019
f27427a
CLN stale comment
pierreglaser Jun 7, 2019
21726eb
typo
ogrisel Jun 7, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ matrix:
dist: trusty
python: "pypy3"
- os: linux
if: commit_message =~ /(\[ci python-nightly\])/
env: PYTHON_NIGHTLY=1
python: 3.7
- os: linux
python: 3.7
- os: linux
Expand Down Expand Up @@ -91,8 +91,12 @@ install:
- $PYTHON_EXE -m pip install .
- $PYTHON_EXE -m pip install --upgrade -r dev-requirements.txt
- $PYTHON_EXE -m pip install tornado
- if [[ $TRAVIS_PYTHON_VERSION != 'pypy'* && "$PYTHON_NIGHTLY" != 1 ]]; then
$PYTHON_EXE -m pip install numpy scipy;
- if [[ $TRAVIS_PYTHON_VERSION != 'pypy'* ]]; then
if [[ "$PYTHON_NIGHTLY" == "1" ]]; then
$PYTHON_EXE -m pip install git+https://github.com/cython/cython git+https://github.com/numpy/numpy;
else
$PYTHON_EXE -m pip install numpy scipy;
fi
fi
- if [[ $PROJECT != "" ]]; then
$PYTHON_EXE -m pip install $TEST_REQUIREMENTS;
Expand Down Expand Up @@ -126,5 +130,6 @@ script:
fi
fi
after_success:
- pip install coverage codecov
- coverage combine --append
- codecov
5 changes: 5 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
1.2.0
=====

- Leverage the C-accelerated Pickler new subclassing API (available in Python
3.8) in cloudpickle. This allows cloudpickle to pickle Python objects up to
30 times faster.
([issue #253](https://github.com/cloudpipe/cloudpickle/pull/253))

- Support pickling of classmethod and staticmethod objects in python2.
arguments. ([issue #262](https://github.com/cloudpipe/cloudpickle/pull/262))

Expand Down
6 changes: 6 additions & 0 deletions cloudpickle/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
from __future__ import absolute_import

import sys
import pickle


from cloudpickle.cloudpickle import *
if sys.version_info[:2] >= (3, 8):
from cloudpickle.cloudpickle_fast import CloudPickler, dumps, dump

__version__ = '1.2.0.dev0'
161 changes: 89 additions & 72 deletions cloudpickle/cloudpickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@
PY2 = False
from importlib._bootstrap import _find_spec

_extract_code_globals_cache = weakref.WeakKeyDictionary()


def _ensure_tracking(class_def):
with _DYNAMIC_CLASS_TRACKER_LOCK:
Expand Down Expand Up @@ -195,6 +197,78 @@ def _is_global(obj, name=None):
return obj2 is obj


def _extract_code_globals(co):
"""
Find all globals names read or written to by codeblock co
"""
out_names = _extract_code_globals_cache.get(co)
if out_names is None:
names = co.co_names
out_names = {names[oparg] for _, oparg in _walk_global_ops(co)}

# Declaring a function inside another one using the "def ..."
# syntax generates a constant code object corresonding to the one
# of the nested function's As the nested function may itself need
# global variables, we need to introspect its code, extract its
# globals, (look for code object in it's co_consts attribute..) and
# add the result to code_globals
if co.co_consts:
for const in co.co_consts:
if isinstance(const, types.CodeType):
out_names |= _extract_code_globals(const)

_extract_code_globals_cache[co] = out_names

return out_names


def _find_imported_submodules(code, top_level_dependencies):
"""
Find currently imported submodules used by a function.

Submodules used by a function need to be detected and referenced for the
function to work correctly at depickling time. Because submodules can be
referenced as attribute of their parent package (``package.submodule``), we
need a special introspection technique that does not rely on GLOBAL-related
opcodes to find references of them in a code object.

Example:
```
import concurrent.futures
import cloudpickle
def func():
x = concurrent.futures.ThreadPoolExecutor
if __name__ == '__main__':
cloudpickle.dumps(func)
```
The globals extracted by cloudpickle in the function's state include the
concurrent package, but not its submodule (here, concurrent.futures), which
is the module used by func. Find_imported_submodules will detect the usage
of concurrent.futures. Saving this module alongside with func will ensure
that calling func once depickled does not fail due to concurrent.futures
not being imported
"""

subimports = []
# check if any known dependency is an imported package
for x in top_level_dependencies:
if (isinstance(x, types.ModuleType) and
hasattr(x, '__package__') and x.__package__):
# check if the package has any currently loaded sub-imports
prefix = x.__name__ + '.'
# A concurrent thread could mutate sys.modules,
# make sure we iterate over a copy to avoid exceptions
for name in list(sys.modules):
# Older versions of pytest will add a "None" module to
# sys.modules.
if name is not None and name.startswith(prefix):
# check whether the function can address the sub-module
tokens = set(name[len(prefix):].split('.'))
if not tokens - set(code.co_names):
subimports.append(sys.modules[name])
return subimports


def _make_cell_set_template_code():
"""Get the Python compiler to emit LOAD_FAST(arg); STORE_DEREF

Expand Down Expand Up @@ -493,54 +567,6 @@ def save_pypy_builtin_func(self, obj):
obj.__dict__)
self.save_reduce(*rv, obj=obj)


def _save_subimports(self, code, top_level_dependencies):
"""
Save submodules used by a function but not listed in its globals.

In the example below:

```
import concurrent.futures
import cloudpickle


def func():
x = concurrent.futures.ThreadPoolExecutor


if __name__ == '__main__':
cloudpickle.dumps(func)
```

the globals extracted by cloudpickle in the function's state include
the concurrent module, but not its submodule (here,
concurrent.futures), which is the module used by func.

To ensure that calling the depickled function does not raise an
AttributeError, this function looks for any currently loaded submodule
that the function uses and whose parent is present in the function
globals, and saves it before saving the function.
"""

# check if any known dependency is an imported package
for x in top_level_dependencies:
if isinstance(x, types.ModuleType) and hasattr(x, '__package__') and x.__package__:
# check if the package has any currently loaded sub-imports
prefix = x.__name__ + '.'
# A concurrent thread could mutate sys.modules,
# make sure we iterate over a copy to avoid exceptions
for name in list(sys.modules):
# Older versions of pytest will add a "None" module to sys.modules.
if name is not None and name.startswith(prefix):
# check whether the function can address the sub-module
tokens = set(name[len(prefix):].split('.'))
if not tokens - set(code.co_names):
# ensure unpickler executes this import
self.save(sys.modules[name])
# then discards the reference to it
self.write(pickle.POP)

def _save_dynamic_enum(self, obj, clsdict):
"""Special handling for dynamic Enum subclasses

Expand Down Expand Up @@ -676,7 +702,12 @@ def save_function_tuple(self, func):
save(_fill_function) # skeleton function updater
write(pickle.MARK) # beginning of tuple that _fill_function expects

self._save_subimports(
# Extract currently-imported submodules used by func. Storing these
# modules in a smoke _cloudpickle_subimports attribute of the object's
# state will trigger the side effect of importing these modules at
# unpickling time (which is necessary for func to work correctly once
# depickled)
submodules = _find_imported_submodules(
code,
itertools.chain(f_globals.values(), closure_values or ()),
)
Expand All @@ -700,6 +731,7 @@ def save_function_tuple(self, func):
'module': func.__module__,
'name': func.__name__,
'doc': func.__doc__,
'_cloudpickle_submodules': submodules
}
if hasattr(func, '__annotations__') and sys.version_info >= (3, 4):
state['annotations'] = func.__annotations__
Expand All @@ -711,28 +743,6 @@ def save_function_tuple(self, func):
write(pickle.TUPLE)
write(pickle.REDUCE) # applies _fill_function on the tuple

_extract_code_globals_cache = weakref.WeakKeyDictionary()

@classmethod
def extract_code_globals(cls, co):
"""
Find all globals names read or written to by codeblock co
"""
out_names = cls._extract_code_globals_cache.get(co)
if out_names is None:
names = co.co_names
out_names = {names[oparg] for _, oparg in _walk_global_ops(co)}

# see if nested function have any global refs
if co.co_consts:
for const in co.co_consts:
if isinstance(const, types.CodeType):
out_names |= cls.extract_code_globals(const)

cls._extract_code_globals_cache[co] = out_names

return out_names

def extract_func_data(self, func):
"""
Turn the function into a tuple of data necessary to recreate it:
Expand All @@ -741,7 +751,7 @@ def extract_func_data(self, func):
code = func.__code__

# extract all global ref's
func_global_refs = self.extract_code_globals(code)
func_global_refs = _extract_code_globals(code)

# process all variables referenced by global environment
f_globals = {}
Expand Down Expand Up @@ -1202,6 +1212,13 @@ def _fill_function(*args):
func.__qualname__ = state['qualname']
if 'kwdefaults' in state:
func.__kwdefaults__ = state['kwdefaults']
# _cloudpickle_subimports is a set of submodules that must be loaded for
# the pickled function to work correctly at unpickling time. Now that these
# submodules are depickled (hence imported), they can be removed from the
# object's state (the object state only served as a reference holder to
# these submodules)
if '_cloudpickle_submodules' in state:
state.pop('_cloudpickle_submodules')

cells = func.__closure__
if cells is not None:
Expand Down
Loading