Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loky fails for Python 3.8 when importing ipyparallel 6.2.5 #240

Open
basnijholt opened this issue Apr 9, 2020 · 9 comments
Open

loky fails for Python 3.8 when importing ipyparallel 6.2.5 #240

basnijholt opened this issue Apr 9, 2020 · 9 comments

Comments

@basnijholt
Copy link
Contributor

basnijholt commented Apr 9, 2020

In this effort to support Loky for Adaptive (python-adaptive/adaptive#263), we see that Loky fails in the CI for all Python 3.8 tests. See these builds logs.

I see in this PR #232

We recently temporarily removed the Python 3.8 entries of the CI due to a failing test caused by a reference cycle in early Python 3.8 versions. Now that this bug is fixed upstream, we can skip the failing test on the appropriate Python versions where this bug exists, and restore the rest of the CI suite.

However, no cause is specified. "this bug is fixed upstream" Where upsteam?

The traceback:

E       RuntimeError: An error occured while evaluating "learner.function(-1.0)". See the traceback for details.:
E       
E       loky.process_executor._RemoteTraceback: 
E       '''
E       Traceback (most recent call last):
E         File "d:\a\1\s\.tox\py38-alldeps\lib\site-packages\loky\process_executor.py", line 391, in _process_worker
E           call_item = call_queue.get(block=True, timeout=timeout)
E         File "c:\hostedtoolcache\windows\python\3.8.2\x64\lib\multiprocessing\queues.py", line 116, in get
E           return _ForkingPickler.loads(res)
E         File "d:\a\1\s\.tox\py38-alldeps\lib\site-packages\ipyparallel\serialize\codeutil.py", line 24, in code_ctor
E           return types.CodeType(*args)
E       TypeError: an integer is required (got type bytes)
E       '''
E       
E       The above exception was the direct cause of the following exception:
E       
E       Traceback (most recent call last):
E         File "D:\a\1\s\adaptive\runner.py", line 193, in _process_futures
E           y = fut.result()
E         File "c:\hostedtoolcache\windows\python\3.8.2\x64\lib\concurrent\futures\_base.py", line 432, in result
E           return self.__get_result()
E         File "c:\hostedtoolcache\windows\python\3.8.2\x64\lib\concurrent\futures\_base.py", line 388, in __get_result
E           raise self._exception
E       loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

These tests for not fail for Python 3.6 and 3.7. Finally, locally this test also passes for Python 3.8!

@pierreglaser
Copy link
Collaborator

pierreglaser commented Apr 9, 2020

Hi! Thank you for the report.

I believe the the quoted message and the loky PR you linked are unrelated to your problem.
In the the traceback you posted, loky simply signals that the worker failed to unserialize a task.
In particular, the ipyparallel reducer/reconstructor used to serialize code objects looks out of date (code construction semantics endured some breaking changes in Python 3.8 and PEP 570), and thus fails in Python 3.8.

The only worrying bit on the loky side is the fact ipyparallel is used to serialize code objects (loky should use cloudpickle instead, which supports PEP 570). To understand this I would need a MVCE. In any ways, feel free to also look for related (un)serialization bugs reports ipyparallel.

PS: by "upstream", I mean the CPython code base (https://github.com/python/cpython)

@basnijholt
Copy link
Contributor Author

basnijholt commented Apr 9, 2020

Thanks for your detailed look!

I've been able to make a minimal example, where after creating a ipyparallel.Client the exception is raised with the following code:

from ipyparallel import Client

def linear(x):
    return x

import loky
loky_executor = loky.get_reusable_executor()
futs = loky_executor.map(linear, range(10))
list(futs)

Using MacOS and Python 3.8.

@pierreglaser
Copy link
Collaborator

The reproducer looks great, thanks. I'll investigate and see whether the fix should be on the loky end or on the ipyparallel end.

@basnijholt
Copy link
Contributor Author

I've simplified the above code, actually just importing ipyparallel will make loky fail!

@basnijholt
Copy link
Contributor Author

@pierreglaser, I am relatively sure that this is an ipyparallel problem.

It's fixed by ipython/ipyparallel#379, which hasn't made it into a release yet, unfortunately.

It does, however, mean that loky is broken whenever ipyparallel is imported.

@pierreglaser
Copy link
Collaborator

pierreglaser commented Apr 10, 2020

It does, however, mean that loky is broken whenever ipyparallel is imported.

Yes, and it should not.

Are you sure cloudpickle is well installed on your system?

I actually can reproduce locally only if cloudpickle is not installed.
My bad, I actually can reproduce with cloudpickle installed.

@basnijholt
Copy link
Contributor Author

basnijholt commented Apr 10, 2020

(py38) basnijholt-imac~  python
Python 3.8.2 | packaged by conda-forge | (default, Mar 23 2020, 17:55:48)
[Clang 9.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from loky.backend.reduction import get_loky_pickler_name
>>> print(get_loky_pickler_name())
cloudpickle

I don't think it's because of my environment. I've just tried it on a remote cluster with CentOS and get the same:

QUANTUM-NFS-SERVER-001~  python
Python 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 14:55:04)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ipyparallel import Client

>>>
>>> def linear(x):
...     return x
...
>>> import loky
>>> loky_executor = loky.get_reusable_executor()
>>> futs = loky_executor.map(linear, range(10))
>>> list(futs)
loky.process_executor._RemoteTraceback:
'''
Traceback (most recent call last):
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/site-packages/loky/process_executor.py", line 391, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/site-packages/ipyparallel/serialize/codeutil.py", line 24, in code_ctor
    return types.CodeType(*args)
TypeError: an integer is required (got type bytes)
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/site-packages/loky/process_executor.py", line 794, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

@pierreglaser
Copy link
Collaborator

Thanks.

@pierreglaser
Copy link
Collaborator

Ok, I just realized we let copyreg-registered reducer override cloudpickle reducers. So if a module like ipyparallel registered faulty reducers in it, loky will fail. I'm not sure we want to change this behavior though.

@basnijholt basnijholt changed the title loky fails in CI only for Python 3.8 loky fails for Python 3.8 when importing ipyparallel 6.2.5 Apr 20, 2020
basnijholt added a commit to python-adaptive/adaptive that referenced this issue Apr 20, 2020
This means that we won't run into joblib/loky#240
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants