Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization issues with dataclasses and IntEnum #3917

Closed
JessicaSchrouff opened this issue Jan 31, 2019 · 16 comments
Closed

Serialization issues with dataclasses and IntEnum #3917

JessicaSchrouff opened this issue Jan 31, 2019 · 16 comments

Comments

@JessicaSchrouff
Copy link

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Mint 19
  • Ray installed from (source or binary): binary
  • Ray version: 0.6.2
  • Python version: 3.6.5

Problems

We encountered multiple issues with the serialization. The first had to deal with a complex object that referred to dataclasses (with the @DataClass decorator). This object could be pickled and unpickled if performing the operation manually but a "MappingProxy" error appeared when trying with ray. We had to 'flatten' our object to a dictionary of tuples, which required quite some work and leads to strong divergences between repo branches.
Just trying to define a:
@dataclass def Foo(): pass
And calling Foo() in the _setup of the trainable class failed.

The second issue is related to the IntEnum type from the enum package. Any IntEnum instance passed to the trainable class would lead to a failure. We tried to write a reduce method but it didn't help. We were able to pickle our IntEnum instances but were not able to use them with ray. We have temporarily changed them to dictionaries, but again this is not sustainable in our code base.

Source code / logs

Issue 1: Mapping Proxy Error
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle mappingproxy objects

Issue 2: IntEnum (note: ray.get(ray.put()) did not work)
Traceback (most recent call last):
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events
result = self.trial_executor.fetch_result(trial)
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result
result = ray.get(trial_future[0])
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/worker.py", line 2211, in get
raise value
ray.worker.RayTaskError: �[36mray_MainArgs:train()�[39m (pid=21123, host=jessica-Z370P-D3)
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/utils.py", line 437, in _wrapper
return orig_attr(*args, **kwargs)
File "pyarrow/_plasma.pyx", line 531, in pyarrow._plasma.PlasmaClient.get
File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize
File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from
File "pyarrow/serialization.pxi", line 262, in pyarrow.lib.SerializedPyObject.deserialize
File "pyarrow/serialization.pxi", line 171, in pyarrow.lib.SerializationContext._deserialize_callback
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/enum.py", line 135, in new
enum_members = {k: classdict[k] for k in classdict._member_names}
AttributeError: 'dict' object has no attribute '_member_names'

Thank you!

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 1, 2019

Hey Jessica,

I can reproduce the second problem like this:

import ray
from enum import IntEnum

class Color(IntEnum):
    RED = 1

ray.init()

ray.get(ray.put(Color))

Do you have a minimal example to reproduce the first problem? Just creating

@dataclass 
class Foo: 
    pass

f = Foo()

And calling ray.put/ray.get on it is working for me. Does it work for you?

-- Philipp.

@JessicaSchrouff
Copy link
Author

Hi Philipp,

If you put something in the dataclass, then it fails. In our settings, we have nested dataclasses (inheritance but also items of one class being another dataclass), which might make things more tricky.

@dataclass
class Foo():
    pass

@dataclass
class Bar(Foo):
    y: int

@dataclass
class Baz:
    z: Foo

This will work for Foo() with ray.get(ray.put()), but not for Bar or Baz.

Thank you for the help!
Best,
Jessica

@JessicaSchrouff
Copy link
Author

Hi Philipp,
We have also noticed that we cannot use ray with torch sparse tensors. These cannot be pickled, but we wrote our own reduce method to make them pickable (like we did for our dataclass classes).

Can we configure ray to use custom serialization such that it would use the reduce method of the object we are trying to pickle?

Thank you!
Best,
Jessica

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 1, 2019

Thanks for reporting, I can reproduce the torch problem like so:

import ray
import torch
ray.init()
i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
v = torch.FloatTensor([3,      4,      5    ])
tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3]))
x = ray.put(tensor)
ray.get(x)

In principle, you can register custom serializers with https://ray.readthedocs.io/en/latest/api.html#ray.register_custom_serializer and this works for custom defined python classes; you can also specify that you want to use pickle as a fallback. I couldn't get it to work for sparse torch tensors yet, I think it is clashing with our custom dense torch serializers (similar story for the IntEnum type, they seem to be treated differently from other types; IntEnum values seem to work though). I will look into the problems and keep you updated.

Did the pickle workaround work for you? If you could share your reduce method with us that would be very helpful so we can integrate this into ray and register it as a custom serializer. It will be more efficient than pickle going forward.

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 1, 2019

(Note that ray already tries to fallback to pickle if it cannot handle the type, but in your case this may or may not work, since torch has the same type for dense and sparse tensors but our custom handler is only working for dense tensors so far)

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 1, 2019

@JessicaSchrouff Concerning the sparse pytorch tensor serialization, I have an upstream fix in apache/arrow#3542 (issue in https://issues.apache.org/jira/projects/ARROW/issues/ARROW-4452). Do you want to review that and see if it would fix the problem for you (i.e. compare with your pickling code)?

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 1, 2019

The IntEnum problem is an upstream problem in cloudpickle, I filed an issue here: cloudpipe/cloudpickle#244

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 1, 2019

The dataclass issue is a known problem with cloudpickle actually, see cloudpipe/cloudpickle#223. It might make sense to support these natively with the pyarrow serialization if they contain only data and no complex python objects. What's your use case for these, i.e. how do your dataclasses look like? If you don't feel free sharing you can anonymize the field names.

@kwohlfahrt
Copy link

Hi Philipp, I'm working with Jessica on the same project, here are the relevant bits of the dataclasses we are using. Note they contain IntEnum and torch.sparse.Tensor properties and they implement their own __reduce__:

def decompose_coo_tensor(tensor):
    return tensor._indices(), tensor._values(), tensor.shape

@dataclass
class EdgeData:
    edges: List[torch.sparse.ByteTensor]

    @classmethod
    def from_dense(cls, data):
        edges = list(starmap(torch.sparse_coo_tensor, data))
        return cls(edges)

    def __reduce__(self):
        data = list(map(decompose_coo_tensor, self.edges))
        return (self.from_dense, (data,))
@dataclass
class EdgeMeta:
    edge_types: Type[IntEnum]

    def __reduce__(self):
        return self.from_tuple, self.as_tuple()

    # For pickle
    def as_tuple(self):
        out = []
        for field in fields(self):
            value = getattr(self, field.name)
            if field.type == Type[IntEnum]:
                value = value.__name__, [e.name for e in value]
            out.append(value)
        return tuple(out)

    @classmethod
    def from_tuple(cls, *args):
        new_args = []
        for field, arg in zip(fields(cls), args):
            if field.type == Type[IntEnum]:
                name, values = arg
                arg = IntEnum(name, map(reversed, enumerate(values)))
            new_args.append(arg)
        return cls(*new_args)

However, as dataclasses can't be serialized, we then tried to work with flat tuples instead (pytorch's JIT has similar limitations), so we lost our custom __reduce__ and then ran into the above issues with serializing sparse tensors & IntEnums.

@robertnishihara
Copy link
Collaborator

@JessicaSchrouff @kwohlfahrt does registering a custom serializer like @pcmoritz mentioned work for you?

See the documentation at https://ray.readthedocs.io/en/latest/api.html#ray.register_custom_serializer.

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 4, 2019

Thanks for sharing the example!

This is indeed currently blocked on cloudpipe/cloudpickle#223. While ray.register_custom_serializer would work in principle, it needs the type (not the value) of the object to be serializable with cloudpickle (so it can be shipped to other workers). @JessicaSchrouff @kwohlfahrt It wouldn't hurt if you comment on the linked issue in cloudpickled to make them aware that this is a problem people are running into. After that's resolved, serializing EdgeData out of the box will work once my pyarrow fix is merged. Same for EdgeMeta and the issue cloudpipe/cloudpickle#244.

For future reference, here is a complete example:

from dataclasses import dataclass
from typing import List

import torch
import ray

ray.init()

def decompose_coo_tensor(tensor):
    return tensor._indices(), tensor._values(), tensor.shape

@dataclass
class EdgeData:
    edges: List[torch.sparse.ByteTensor]

    @classmethod
    def from_dense(cls, data):
        edges = list(starmap(torch.sparse_coo_tensor, data))
        return cls(edges)

    def __reduce__(self):
        data = list(map(decompose_coo_tensor, self.edges))
        return (self.from_dense, (data,))

i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
v = torch.ByteTensor([3,      4,      5    ])
tensor = torch.sparse.ByteTensor(i.t(), v, torch.Size([2,3]))

data = EdgeData(edges=tensor)
x = ray.put(data)
ray.get(x)

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 5, 2019

I created a PR to fix the dataclass pickling issue upstream: cloudpipe/cloudpickle#245

@JessicaSchrouff
Copy link
Author

Thank you Philipp for all your help!
I'll look into custom serialization while the issues are fixed and PRs merged.

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 8, 2019

@JessicaSchrouff @kwohlfahrt On the latest master, you can now do the following:

    class CustomClass(object):
        def __init__(self, value):
            self.value = value

    @dataclass
    class DataClass1:
        custom: CustomClass

        @classmethod
        def from_custom(cls, data):
            custom =  CustomClass(data)
            return cls(custom)

        def __reduce__(self):
            return (self.from_custom, (custom.value,))


    data = DataClass1(custom=CustomClass(43))
    x = ray.put(data)
    assert ray.get(x).custom.value == 43

That should fix both the cases from above if you use your reduce method from above.

Once cloudpipe/cloudpickle#244 is fixed and apache/arrow#3542 merged into Ray it will also work without the reduce methods.

@JessicaSchrouff
Copy link
Author

Hi @pcmoritz ,

Thank you so much!

I have now installed the latest version and am not running into the MappingProxy error. However, I did not manage to use our own reduce methods for the IntEnum or for the sparse tensors.

class CustomClass(object):
    def __init__(self,key,fields):
        self.vals=IntEnum(key,fields)

@dataclass
class DataClass1:
    custom: CustomClass

    @classmethod
    def from_custom(cls,key,fields):
        custom=CustomClass(key,fields)
        return cls(custom)

    def __reduce__(self):
        key=custom.vals.__name__
        fields=[e.name for e in custom.vals]
        return (self.from_custom, (key,fields))

data=DataClass1(custom=CustomClass("bla", "foo bar baz"))
x = ray.put(data)
v = ray.get(x)

Gives the error related to IntEnum on the get:

Traceback (most recent call last):
File "", line 1, in
File "/home/jessica/ray/python/ray/worker.py", line 2235, in get
value = worker.get_object([object_ids])[0]
File "/home/jessica/ray/python/ray/worker.py", line 460, in get_object
final_results = self.retrieve_and_deserialize(plain_object_ids, 0)
File "/home/jessica/ray/python/ray/worker.py", line 395, in retrieve_and_deserialize
self.get_serialization_context(self.task_driver_id))
File "/home/jessica/ray/python/ray/utils.py", line 442, in _wrapper
return orig_attr(*args, **kwargs)
File "pyarrow/_plasma.pyx", line 531, in pyarrow._plasma.PlasmaClient.get
File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize
File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from
File "pyarrow/serialization.pxi", line 262, in pyarrow.lib.SerializedPyObject.deserialize
File "pyarrow/serialization.pxi", line 171, in pyarrow.lib.SerializationContext._deserialize_callback
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/enum.py", line 135, in new
enum_members = {k: classdict[k] for k in classdict._member_names}
AttributeError: 'dict' object has no attribute '_member_names'

Similarly, I get the error related to sparse tensors when using the "EdgeData" inside another dataclass.

I managed to get ray working on my data structures by using the custom serialization, with use_pickle=True. We put a "print" statement inside our reduce method but it never got displayed, which suggests that our reduce methods do not get called.

Thank you for the help!
Best,
Jessica

@robertnishihara
Copy link
Collaborator

This is fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants