-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialization issues with dataclasses and IntEnum #3917
Comments
Hey Jessica, I can reproduce the second problem like this: import ray
from enum import IntEnum
class Color(IntEnum):
RED = 1
ray.init()
ray.get(ray.put(Color)) Do you have a minimal example to reproduce the first problem? Just creating @dataclass
class Foo:
pass
f = Foo() And calling ray.put/ray.get on it is working for me. Does it work for you? -- Philipp. |
Hi Philipp, If you put something in the dataclass, then it fails. In our settings, we have nested dataclasses (inheritance but also items of one class being another dataclass), which might make things more tricky. @dataclass
class Foo():
pass
@dataclass
class Bar(Foo):
y: int
@dataclass
class Baz:
z: Foo This will work for Foo() with ray.get(ray.put()), but not for Bar or Baz. Thank you for the help! |
Hi Philipp, Can we configure ray to use custom serialization such that it would use the reduce method of the object we are trying to pickle? Thank you! |
Thanks for reporting, I can reproduce the torch problem like so: import ray
import torch
ray.init()
i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
v = torch.FloatTensor([3, 4, 5 ])
tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3]))
x = ray.put(tensor)
ray.get(x) In principle, you can register custom serializers with https://ray.readthedocs.io/en/latest/api.html#ray.register_custom_serializer and this works for custom defined python classes; you can also specify that you want to use pickle as a fallback. I couldn't get it to work for sparse torch tensors yet, I think it is clashing with our custom dense torch serializers (similar story for the IntEnum type, they seem to be treated differently from other types; IntEnum values seem to work though). I will look into the problems and keep you updated. Did the pickle workaround work for you? If you could share your reduce method with us that would be very helpful so we can integrate this into ray and register it as a custom serializer. It will be more efficient than pickle going forward. |
(Note that ray already tries to fallback to pickle if it cannot handle the type, but in your case this may or may not work, since torch has the same type for dense and sparse tensors but our custom handler is only working for dense tensors so far) |
@JessicaSchrouff Concerning the sparse pytorch tensor serialization, I have an upstream fix in apache/arrow#3542 (issue in https://issues.apache.org/jira/projects/ARROW/issues/ARROW-4452). Do you want to review that and see if it would fix the problem for you (i.e. compare with your pickling code)? |
The IntEnum problem is an upstream problem in cloudpickle, I filed an issue here: cloudpipe/cloudpickle#244 |
The dataclass issue is a known problem with cloudpickle actually, see cloudpipe/cloudpickle#223. It might make sense to support these natively with the pyarrow serialization if they contain only data and no complex python objects. What's your use case for these, i.e. how do your dataclasses look like? If you don't feel free sharing you can anonymize the field names. |
Hi Philipp, I'm working with Jessica on the same project, here are the relevant bits of the dataclasses we are using. Note they contain
However, as dataclasses can't be serialized, we then tried to work with flat tuples instead (pytorch's JIT has similar limitations), so we lost our custom |
@JessicaSchrouff @kwohlfahrt does registering a custom serializer like @pcmoritz mentioned work for you? See the documentation at https://ray.readthedocs.io/en/latest/api.html#ray.register_custom_serializer. |
Thanks for sharing the example! This is indeed currently blocked on cloudpipe/cloudpickle#223. While ray.register_custom_serializer would work in principle, it needs the type (not the value) of the object to be serializable with cloudpickle (so it can be shipped to other workers). @JessicaSchrouff @kwohlfahrt It wouldn't hurt if you comment on the linked issue in cloudpickled to make them aware that this is a problem people are running into. After that's resolved, serializing EdgeData out of the box will work once my pyarrow fix is merged. Same for EdgeMeta and the issue cloudpipe/cloudpickle#244. For future reference, here is a complete example: from dataclasses import dataclass
from typing import List
import torch
import ray
ray.init()
def decompose_coo_tensor(tensor):
return tensor._indices(), tensor._values(), tensor.shape
@dataclass
class EdgeData:
edges: List[torch.sparse.ByteTensor]
@classmethod
def from_dense(cls, data):
edges = list(starmap(torch.sparse_coo_tensor, data))
return cls(edges)
def __reduce__(self):
data = list(map(decompose_coo_tensor, self.edges))
return (self.from_dense, (data,))
i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
v = torch.ByteTensor([3, 4, 5 ])
tensor = torch.sparse.ByteTensor(i.t(), v, torch.Size([2,3]))
data = EdgeData(edges=tensor)
x = ray.put(data)
ray.get(x) |
I created a PR to fix the dataclass pickling issue upstream: cloudpipe/cloudpickle#245 |
Thank you Philipp for all your help! |
@JessicaSchrouff @kwohlfahrt On the latest master, you can now do the following: class CustomClass(object):
def __init__(self, value):
self.value = value
@dataclass
class DataClass1:
custom: CustomClass
@classmethod
def from_custom(cls, data):
custom = CustomClass(data)
return cls(custom)
def __reduce__(self):
return (self.from_custom, (custom.value,))
data = DataClass1(custom=CustomClass(43))
x = ray.put(data)
assert ray.get(x).custom.value == 43 That should fix both the cases from above if you use your reduce method from above. Once cloudpipe/cloudpickle#244 is fixed and apache/arrow#3542 merged into Ray it will also work without the reduce methods. |
Hi @pcmoritz , Thank you so much! I have now installed the latest version and am not running into the MappingProxy error. However, I did not manage to use our own reduce methods for the IntEnum or for the sparse tensors. class CustomClass(object):
def __init__(self,key,fields):
self.vals=IntEnum(key,fields)
@dataclass
class DataClass1:
custom: CustomClass
@classmethod
def from_custom(cls,key,fields):
custom=CustomClass(key,fields)
return cls(custom)
def __reduce__(self):
key=custom.vals.__name__
fields=[e.name for e in custom.vals]
return (self.from_custom, (key,fields))
data=DataClass1(custom=CustomClass("bla", "foo bar baz"))
x = ray.put(data)
v = ray.get(x) Gives the error related to IntEnum on the get:
Similarly, I get the error related to sparse tensors when using the "EdgeData" inside another dataclass. I managed to get ray working on my data structures by using the custom serialization, with use_pickle=True. We put a "print" statement inside our reduce method but it never got displayed, which suggests that our reduce methods do not get called. Thank you for the help! |
This is fixed now. |
System information
Problems
We encountered multiple issues with the serialization. The first had to deal with a complex object that referred to dataclasses (with the @DataClass decorator). This object could be pickled and unpickled if performing the operation manually but a "MappingProxy" error appeared when trying with ray. We had to 'flatten' our object to a dictionary of tuples, which required quite some work and leads to strong divergences between repo branches.
Just trying to define a:
@dataclass def Foo(): pass
And calling Foo() in the _setup of the trainable class failed.
The second issue is related to the IntEnum type from the enum package. Any IntEnum instance passed to the trainable class would lead to a failure. We tried to write a reduce method but it didn't help. We were able to pickle our IntEnum instances but were not able to use them with ray. We have temporarily changed them to dictionaries, but again this is not sustainable in our code base.
Source code / logs
Issue 1: Mapping Proxy Error
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle mappingproxy objects
Issue 2: IntEnum (note: ray.get(ray.put()) did not work)
Traceback (most recent call last):
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events
result = self.trial_executor.fetch_result(trial)
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result
result = ray.get(trial_future[0])
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/worker.py", line 2211, in get
raise value
ray.worker.RayTaskError: �[36mray_MainArgs:train()�[39m (pid=21123, host=jessica-Z370P-D3)
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/utils.py", line 437, in _wrapper
return orig_attr(*args, **kwargs)
File "pyarrow/_plasma.pyx", line 531, in pyarrow._plasma.PlasmaClient.get
File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize
File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from
File "pyarrow/serialization.pxi", line 262, in pyarrow.lib.SerializedPyObject.deserialize
File "pyarrow/serialization.pxi", line 171, in pyarrow.lib.SerializationContext._deserialize_callback
File "/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/enum.py", line 135, in new
enum_members = {k: classdict[k] for k in classdict._member_names}
AttributeError: 'dict' object has no attribute '_member_names'
Thank you!
The text was updated successfully, but these errors were encountered: