Serialization issues with dataclasses and IntEnum #3917

JessicaSchrouff · 2019-01-31T15:27:40Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Mint 19
Ray installed from (source or binary): binary
Ray version: 0.6.2
Python version: 3.6.5

Problems

We encountered multiple issues with the serialization. The first had to deal with a complex object that referred to dataclasses (with the @DataClass decorator). This object could be pickled and unpickled if performing the operation manually but a "MappingProxy" error appeared when trying with ray. We had to 'flatten' our object to a dictionary of tuples, which required quite some work and leads to strong divergences between repo branches.
Just trying to define a:
@dataclass def Foo(): pass
And calling Foo() in the _setup of the trainable class failed.

The second issue is related to the IntEnum type from the enum package. Any IntEnum instance passed to the trainable class would lead to a failure. We tried to write a reduce method but it didn't help. We were able to pickle our IntEnum instances but were not able to use them with ray. We have temporarily changed them to dictionaries, but again this is not sustainable in our code base.

Source code / logs

Issue 1: Mapping Proxy Error
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle mappingproxy objects

Issue 2: IntEnum (note: ray.get(ray.put()) did not work)
Traceback (most recent call last):
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events
result = self.trial_executor.fetch_result(trial)
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result
result = ray.get(trial_future[0])
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/worker.py", line 2211, in get
raise value
ray.worker.RayTaskError: �[36mray_MainArgs:train()�[39m (pid=21123, host=jessica-Z370P-D3)
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/utils.py", line 437, in _wrapper
return orig_attr(*args, **kwargs)
File "pyarrow/_plasma.pyx", line 531, in pyarrow._plasma.PlasmaClient.get
File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize
File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from
File "pyarrow/serialization.pxi", line 262, in pyarrow.lib.SerializedPyObject.deserialize
File "pyarrow/serialization.pxi", line 171, in pyarrow.lib.SerializationContext._deserialize_callback
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/enum.py", line 135, in new
enum_members = {k: classdict[k] for k in classdict._member_names}
AttributeError: 'dict' object has no attribute '_member_names'

Thank you!

The text was updated successfully, but these errors were encountered:

pcmoritz · 2019-02-01T01:11:29Z

Hey Jessica,

I can reproduce the second problem like this:

import ray
from enum import IntEnum

class Color(IntEnum):
    RED = 1

ray.init()

ray.get(ray.put(Color))

Do you have a minimal example to reproduce the first problem? Just creating

@dataclass 
class Foo: 
    pass

f = Foo()

And calling ray.put/ray.get on it is working for me. Does it work for you?

-- Philipp.

JessicaSchrouff · 2019-02-01T10:17:15Z

Hi Philipp,

If you put something in the dataclass, then it fails. In our settings, we have nested dataclasses (inheritance but also items of one class being another dataclass), which might make things more tricky.

@dataclass
class Foo():
    pass

@dataclass
class Bar(Foo):
    y: int

@dataclass
class Baz:
    z: Foo

This will work for Foo() with ray.get(ray.put()), but not for Bar or Baz.

Thank you for the help!
Best,
Jessica

JessicaSchrouff · 2019-02-01T16:17:40Z

Hi Philipp,
We have also noticed that we cannot use ray with torch sparse tensors. These cannot be pickled, but we wrote our own reduce method to make them pickable (like we did for our dataclass classes).

Can we configure ray to use custom serialization such that it would use the reduce method of the object we are trying to pickle?

Thank you!
Best,
Jessica

pcmoritz · 2019-02-01T19:31:54Z

Thanks for reporting, I can reproduce the torch problem like so:

import ray
import torch
ray.init()
i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
v = torch.FloatTensor([3,      4,      5    ])
tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3]))
x = ray.put(tensor)
ray.get(x)

In principle, you can register custom serializers with https://ray.readthedocs.io/en/latest/api.html#ray.register_custom_serializer and this works for custom defined python classes; you can also specify that you want to use pickle as a fallback. I couldn't get it to work for sparse torch tensors yet, I think it is clashing with our custom dense torch serializers (similar story for the IntEnum type, they seem to be treated differently from other types; IntEnum values seem to work though). I will look into the problems and keep you updated.

Did the pickle workaround work for you? If you could share your reduce method with us that would be very helpful so we can integrate this into ray and register it as a custom serializer. It will be more efficient than pickle going forward.

pcmoritz · 2019-02-01T19:37:03Z

(Note that ray already tries to fallback to pickle if it cannot handle the type, but in your case this may or may not work, since torch has the same type for dense and sparse tensors but our custom handler is only working for dense tensors so far)

pcmoritz · 2019-02-01T22:04:11Z

@JessicaSchrouff Concerning the sparse pytorch tensor serialization, I have an upstream fix in apache/arrow#3542 (issue in https://issues.apache.org/jira/projects/ARROW/issues/ARROW-4452). Do you want to review that and see if it would fix the problem for you (i.e. compare with your pickling code)?

pcmoritz · 2019-02-01T22:37:11Z

The IntEnum problem is an upstream problem in cloudpickle, I filed an issue here: cloudpipe/cloudpickle#244

pcmoritz · 2019-02-01T23:04:20Z

The dataclass issue is a known problem with cloudpickle actually, see cloudpipe/cloudpickle#223. It might make sense to support these natively with the pyarrow serialization if they contain only data and no complex python objects. What's your use case for these, i.e. how do your dataclasses look like? If you don't feel free sharing you can anonymize the field names.

kwohlfahrt · 2019-02-04T10:25:53Z

Hi Philipp, I'm working with Jessica on the same project, here are the relevant bits of the dataclasses we are using. Note they contain IntEnum and torch.sparse.Tensor properties and they implement their own __reduce__:

def decompose_coo_tensor(tensor):
    return tensor._indices(), tensor._values(), tensor.shape

@dataclass
class EdgeData:
    edges: List[torch.sparse.ByteTensor]

    @classmethod
    def from_dense(cls, data):
        edges = list(starmap(torch.sparse_coo_tensor, data))
        return cls(edges)

    def __reduce__(self):
        data = list(map(decompose_coo_tensor, self.edges))
        return (self.from_dense, (data,))

@dataclass
class EdgeMeta:
    edge_types: Type[IntEnum]

    def __reduce__(self):
        return self.from_tuple, self.as_tuple()

    # For pickle
    def as_tuple(self):
        out = []
        for field in fields(self):
            value = getattr(self, field.name)
            if field.type == Type[IntEnum]:
                value = value.__name__, [e.name for e in value]
            out.append(value)
        return tuple(out)

    @classmethod
    def from_tuple(cls, *args):
        new_args = []
        for field, arg in zip(fields(cls), args):
            if field.type == Type[IntEnum]:
                name, values = arg
                arg = IntEnum(name, map(reversed, enumerate(values)))
            new_args.append(arg)
        return cls(*new_args)

However, as dataclasses can't be serialized, we then tried to work with flat tuples instead (pytorch's JIT has similar limitations), so we lost our custom __reduce__ and then ran into the above issues with serializing sparse tensors & IntEnums.

robertnishihara · 2019-02-04T21:22:06Z

@JessicaSchrouff @kwohlfahrt does registering a custom serializer like @pcmoritz mentioned work for you?

See the documentation at https://ray.readthedocs.io/en/latest/api.html#ray.register_custom_serializer.

pcmoritz · 2019-02-04T23:24:02Z

Thanks for sharing the example!

This is indeed currently blocked on cloudpipe/cloudpickle#223. While ray.register_custom_serializer would work in principle, it needs the type (not the value) of the object to be serializable with cloudpickle (so it can be shipped to other workers). @JessicaSchrouff @kwohlfahrt It wouldn't hurt if you comment on the linked issue in cloudpickled to make them aware that this is a problem people are running into. After that's resolved, serializing EdgeData out of the box will work once my pyarrow fix is merged. Same for EdgeMeta and the issue cloudpipe/cloudpickle#244.

For future reference, here is a complete example:

from dataclasses import dataclass
from typing import List

import torch
import ray

ray.init()

def decompose_coo_tensor(tensor):
    return tensor._indices(), tensor._values(), tensor.shape

@dataclass
class EdgeData:
    edges: List[torch.sparse.ByteTensor]

    @classmethod
    def from_dense(cls, data):
        edges = list(starmap(torch.sparse_coo_tensor, data))
        return cls(edges)

    def __reduce__(self):
        data = list(map(decompose_coo_tensor, self.edges))
        return (self.from_dense, (data,))

i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
v = torch.ByteTensor([3,      4,      5    ])
tensor = torch.sparse.ByteTensor(i.t(), v, torch.Size([2,3]))

data = EdgeData(edges=tensor)
x = ray.put(data)
ray.get(x)

pcmoritz · 2019-02-05T00:31:20Z

I created a PR to fix the dataclass pickling issue upstream: cloudpipe/cloudpickle#245

JessicaSchrouff · 2019-02-06T09:42:17Z

Thank you Philipp for all your help!
I'll look into custom serialization while the issues are fixed and PRs merged.

pcmoritz · 2019-02-08T00:00:28Z

@JessicaSchrouff @kwohlfahrt On the latest master, you can now do the following:

    class CustomClass(object):
        def __init__(self, value):
            self.value = value

    @dataclass
    class DataClass1:
        custom: CustomClass

        @classmethod
        def from_custom(cls, data):
            custom =  CustomClass(data)
            return cls(custom)

        def __reduce__(self):
            return (self.from_custom, (custom.value,))


    data = DataClass1(custom=CustomClass(43))
    x = ray.put(data)
    assert ray.get(x).custom.value == 43

That should fix both the cases from above if you use your reduce method from above.

Once cloudpipe/cloudpickle#244 is fixed and apache/arrow#3542 merged into Ray it will also work without the reduce methods.

JessicaSchrouff · 2019-02-11T12:04:12Z

Hi @pcmoritz ,

Thank you so much!

I have now installed the latest version and am not running into the MappingProxy error. However, I did not manage to use our own reduce methods for the IntEnum or for the sparse tensors.

class CustomClass(object):
    def __init__(self,key,fields):
        self.vals=IntEnum(key,fields)

@dataclass
class DataClass1:
    custom: CustomClass

    @classmethod
    def from_custom(cls,key,fields):
        custom=CustomClass(key,fields)
        return cls(custom)

    def __reduce__(self):
        key=custom.vals.__name__
        fields=[e.name for e in custom.vals]
        return (self.from_custom, (key,fields))

data=DataClass1(custom=CustomClass("bla", "foo bar baz"))
x = ray.put(data)
v = ray.get(x)

Gives the error related to IntEnum on the get:

Traceback (most recent call last):
File "", line 1, in
File "/home/jessica/ray/python/ray/worker.py", line 2235, in get
value = worker.get_object([object_ids])[0]
File "/home/jessica/ray/python/ray/worker.py", line 460, in get_object
final_results = self.retrieve_and_deserialize(plain_object_ids, 0)
File "/home/jessica/ray/python/ray/worker.py", line 395, in retrieve_and_deserialize
self.get_serialization_context(self.task_driver_id))
File "/home/jessica/ray/python/ray/utils.py", line 442, in _wrapper
return orig_attr(*args, **kwargs)
File "pyarrow/_plasma.pyx", line 531, in pyarrow._plasma.PlasmaClient.get
File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize
File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from
File "pyarrow/serialization.pxi", line 262, in pyarrow.lib.SerializedPyObject.deserialize
File "pyarrow/serialization.pxi", line 171, in pyarrow.lib.SerializationContext._deserialize_callback
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/enum.py", line 135, in new
enum_members = {k: classdict[k] for k in classdict._member_names}
AttributeError: 'dict' object has no attribute '_member_names'

Similarly, I get the error related to sparse tensors when using the "EdgeData" inside another dataclass.

I managed to get ray working on my data structures by using the custom serialization, with use_pickle=True. We put a "print" statement inside our reduce method but it never got displayed, which suggests that our reduce methods do not get called.

Thank you for the help!
Best,
Jessica

robertnishihara · 2020-01-24T07:18:34Z

This is fixed now.

JessicaSchrouff mentioned this issue Feb 1, 2019

Ray serialization problems #557

Closed

pcmoritz mentioned this issue Feb 6, 2019

Update cloudpickle to 0.8.0.dev0 #3964

Merged

pcmoritz mentioned this issue Feb 8, 2019

Fix IntEnum pickling issue #3996

Closed

pcmoritz mentioned this issue Feb 16, 2019

Upgrade to cloudpickle 0.9.0.dev0 #4073

Closed

robertnishihara closed this as completed Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization issues with dataclasses and IntEnum #3917

Serialization issues with dataclasses and IntEnum #3917

JessicaSchrouff commented Jan 31, 2019

pcmoritz commented Feb 1, 2019 •

edited

Loading

JessicaSchrouff commented Feb 1, 2019

JessicaSchrouff commented Feb 1, 2019

pcmoritz commented Feb 1, 2019 •

edited

Loading

pcmoritz commented Feb 1, 2019

pcmoritz commented Feb 1, 2019

pcmoritz commented Feb 1, 2019

pcmoritz commented Feb 1, 2019 •

edited

Loading

kwohlfahrt commented Feb 4, 2019

robertnishihara commented Feb 4, 2019

pcmoritz commented Feb 4, 2019 •

edited

Loading

pcmoritz commented Feb 5, 2019

JessicaSchrouff commented Feb 6, 2019

pcmoritz commented Feb 8, 2019 •

edited

Loading

JessicaSchrouff commented Feb 11, 2019

robertnishihara commented Jan 24, 2020

Serialization issues with dataclasses and IntEnum #3917

Serialization issues with dataclasses and IntEnum #3917

Comments

JessicaSchrouff commented Jan 31, 2019

System information

Problems

Source code / logs

pcmoritz commented Feb 1, 2019 • edited Loading

JessicaSchrouff commented Feb 1, 2019

JessicaSchrouff commented Feb 1, 2019

pcmoritz commented Feb 1, 2019 • edited Loading

pcmoritz commented Feb 1, 2019

pcmoritz commented Feb 1, 2019

pcmoritz commented Feb 1, 2019

pcmoritz commented Feb 1, 2019 • edited Loading

kwohlfahrt commented Feb 4, 2019

robertnishihara commented Feb 4, 2019

pcmoritz commented Feb 4, 2019 • edited Loading

pcmoritz commented Feb 5, 2019

JessicaSchrouff commented Feb 6, 2019

pcmoritz commented Feb 8, 2019 • edited Loading

JessicaSchrouff commented Feb 11, 2019

robertnishihara commented Jan 24, 2020

pcmoritz commented Feb 1, 2019 •

edited

Loading

pcmoritz commented Feb 1, 2019 •

edited

Loading

pcmoritz commented Feb 1, 2019 •

edited

Loading

pcmoritz commented Feb 4, 2019 •

edited

Loading

pcmoritz commented Feb 8, 2019 •

edited

Loading