Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudpickle pickle 'CudnnModule' object #405

Open
briankosw opened this issue Dec 4, 2020 · 6 comments
Open

cloudpickle pickle 'CudnnModule' object #405

briankosw opened this issue Dec 4, 2020 · 6 comments

Comments

@briankosw
Copy link

I was spawning parallel processes using Joblib for training distributed PyTorch code, and Joblib uses cloudpickle to do its pickling. I've narrowed down the problem to cloudpickle, as demonstrated in the minimal repro below:

# main.py
import torch.backends.cudnn
import cloudpickle


def foo():
    print(torch.backends.cudnn)


cloudpickle.dumps(foo)
>>> python main.py

Traceback (most recent call last):
  File "main.py", line 9, in <module>
    cloudpickle.dumps(foo)
  File "/Users/BrianKo/anaconda3/envs/hydra38/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/Users/BrianKo/anaconda3/envs/hydra38/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'CudnnModule' object

The above script works when I use Python's pickle library instead. Is there a way to fix this?

@pierreglaser
Copy link
Member

Thanks for the reproducer. I'll look into it. This script should not error out.

@briankosw
Copy link
Author

Any updates on this @pierreglaser?

@omry
Copy link

omry commented Jan 1, 2021

@pierreglaser , it will really help to at least get some clarity about the root cause to understand if this is something that should be fixed in cloudpickle or in pytorch.

@pierreglaser
Copy link
Member

I have looked at this in more detail. I'm going to lay out the full story and hopefully we'll decide on a fix fast.

types.ModuleType (or module, types.ModuleType being an alias for module) objects are not picklable by default:

(cloudpickle_py37) ~/repos/cloudpickle (master)❯_ python
Python 3.7.3 (default, Apr  3 2019, 19:16:38)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> type(os)
<class 'module'>
>>> import pickle
>>> pickle.dumps(os)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't pickle module objects

cloudpickle can pickle module objects using a custom saving method, but since this custom saving method takes priority over user-defined saving methods (such as __reduce__ method), this saving method is only applied on "direct" module instances, and not instances of modules subclasses (such as CudnnModule). Hence, we have

>>> import os
>>> import cloudpickle
>>> type(os)
<class 'module'>
>>> cloudpickle.dumps(os)
b'\x80\x04\x952\x00\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\tsubimport\x94\x93\x94\x8c\x02os\x94\x85\x94R\x94.'
>>> import torch.backends.cudnn as c
>>> type(c)
<class 'torch.backends.cudnn.CudnnModule'>
>>> cloudpickle.dumps(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pierreglaser/repos/cloudpickle/cloudpickle/cloudpickle_fast.py", line 102, in dumps
    cp.dump(obj)
  File "/home/pierreglaser/repos/cloudpickle/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
  File "/usr/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/usr/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
TypeError: can't pickle CudnnModule objects

cloudpickle does not usually register saving methods of third-party modules, so at first sight, I would lean against registering a saving method explicitly for CudnnModule (or any pytorch module class) directly in cloudpickle. However, pytorch may register a cloudpickle saving method for CudnnModule as a possible solution (using the CloudPickler.dispatch_table). This method will make CudnnModule objects pickleable by cloudpickle (but not pickle).

If pytorch developers want their module instances picklable by both pickle and cloudpickle, As an alternative solution, it should be possible to define a __reduce__ method on CudnnModule (or/and on any other module class pytorch defines), which would make CudnnModule objects pickleable using both pickle and cloudpickle. Naively, such a __reduce__ method needs only to be defines such that the reconstructor executes import torch.backends.cudnn at unpickling time (see the pickle docs for more detail).

Here is a POC modification of the CudnnModule class:

def _subimport(name):
    __import__(name)
    return sys.modules[name]


class CudnnModule(PropModule):
    def __init__(self, m, name):
        super(CudnnModule, self).__init__(m, name)

    def __reduce__(self):
        return _subimport, (self.__name__,)

    enabled = ContextProp(torch._C._get_cudnn_enabled, torch._C._set_cudnn_enabled)
    deterministic = ContextProp(torch._C._get_cudnn_deterministic, torch._C._set_cudnn_deterministic)
    benchmark = ContextProp(torch._C._get_cudnn_benchmark, torch._C._set_cudnn_benchmark)
    allow_tf32 = ContextProp(torch._C._get_cudnn_allow_tf32, torch._C._set_cudnn_allow_tf32)

Notice the new __reduce__ method. Let's make sure that now, torch.backends.cudnn is pickleable by both pickle and cloudpickle

(pytorch_py37) ~/repos/cloudpickle (master)❯_ python
Python 3.7.9 (default, Aug 18 2020, 06:22:45)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch.backends.cudnn as c
>>> import pickle
>>> import cloudpickle
>>> pickle.loads(pickle.dumps(c)) is c
True
>>> cloudpickle.loads(cloudpickle.dumps(c)) is c
True

cc @ogrisel @omry @briankosw

@omry
Copy link

omry commented Jan 7, 2021

Thanks for digging and for offering the solutions. I agree that it makes more sense for this to be fixed in PyTorch.
I will reach out to some pytorch dev to see what they think.

@vepadulano
Copy link

vepadulano commented Apr 15, 2021

I think this is related to #397 , but in that case the reproducer works with pickle and not with cloudpickle. The difference there is that the custom ModuleType is not serialized directly, but it's a function that references an attribute of that module. Specifically, this works well with pickle

import pickle
import sys
from types import ModuleType

class Facade(ModuleType):
    def __init__(self, name):
        super().__init__(name)
        self.foo = 42

sys.modules['my_module'] = Facade('my_module')

# Pickle a function which uses the module with the facade
import my_module

def foo():
    return my_module.foo == 42

print(pickle.loads(pickle.dumps(foo)))
print(pickle.loads(pickle.dumps(foo())))

And outputs the following (Python 3.8.7)

$ python pickle_function_with_facade.py 
<function foo at 0x7fd5ee4e4310>
True

While substituting pickle with cloudpickle in the above snippet gives

$ python cloudpickle_function_with_facade.py 
Traceback (most recent call last):
  File "cloudpickle_function_with_facade.py", line 18, in <module>
    print(pickle.loads(pickle.dumps(foo)))
  File "/home/vpadulan/.local/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/home/vpadulan/.local/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'Facade' object

I confirm that adding the custom __reduce__ method to the Facade class makes this work with both pickle and cloudpickle. Still, would be nice to understand this different behaviour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants