Ray serialization problems #557

pcmoritz · 2017-05-16T23:07:11Z

We should try to understand better what we can serialize and what we cannot, so I'm making this issue to collect problems with our serialization.

There are five categories of problems:

Things that not even python pickle can serialize. These are mostly out of scope for us.
Things that python pickle can serialize but cloudpickle cannot. We'd like to be aware of these and potentially fix them or report them upstream.
Things that cloudpickle can serialize but we cannot. We'd like to be aware of these and fix them.
Things we we serialize with arrow but deserialization is incorrect. We'd like to fix these.
Things where serialization is slower than expected. We'd like to know about these problems and potentially fix them.

If you run into problems like the above, please post a minimal example here or create a new issue for it.

pcmoritz · 2017-05-16T23:09:43Z

Here is one that fits into 3.:

import re
b = re.compile(r"\d+\.\d*")

robertnishihara · 2017-05-16T23:19:28Z

We actually can handle this case. If you try ray.get(ray.put(b)), it works. The thing that is failing when you pass b into a remote function is something else. I'm looking into it.

robertnishihara · 2017-05-16T23:27:57Z

Ok, the issue here is that pickle.dumps(b) succeeds, but pickle.dumps(type(b)) fails, so we can't do the usual register_class approach because that involves pickling the type.

That said, it seems like we shouldn't actually need to do register_class if we're pickling things.

pcmoritz · 2017-05-17T00:12:10Z

more resources: https://github.com/jsonpickle/jsonpickle/tree/master/tests and https://github.com/jsonpickle/jsonpickle/issues

Ideally we would port all the tests to Ray.

robertnishihara · 2017-05-17T06:26:21Z

Is category (2) a real thing? Cloudpickle is supposed to be more general. It is true, that pickle "succeeds" at serializing some things that cloudpickle fails at, e.g.,

class Foo(object):
  def __init__(self):
    super

cloudpickle.dumps(Foo)  # PicklingError: Could not pickle object as excessively deep recursion required.
pickle.dumps(Foo)  # b'\x80\x03c__main__\nFoo\nq\x00.'

However, this is misleading. Pickle only succeeds because it doesn't capture the class definition (you couldn't unpickle it in another process).

After #550, I think category (3) should more or less be nonexistent. I'm probably wrong about this, but we'll see.

Category (4) is a big problem. E.g., #319.

robertnishihara · 2017-05-22T04:43:46Z

I just remembered an important class of types that fall in category (3), which is subtypes of standard types like lists/dicts/tuples (see #512 for an example).

For example, if we serialize a collections.defaultdict, then it will be deserialized as dictionary. This presumably happens in a lot of types.

wesm · 2017-05-22T20:55:08Z

I opened https://issues.apache.org/jira/browse/ARROW-1059 with the goal of being able to more easily expand the kinds of objects that can be serialized using the Arrow IPC machinery. So if you had objects that don't fit the set of "fast storage" types (tensors and the other primitive types supported), then these could be stored as pickles. pyarrow could register a pickle deserializer to be able to understand this user "message" type

robertnishihara · 2017-07-19T00:32:35Z

Ray currently fails to serialize xarray objects #748 (this is category (3) above).

Wanjun0511 · 2017-08-18T04:17:58Z

Ray also fails to serialize tensorflow-related objects, like Tensor.
Do you have plan to fix it?
I'm working on Reinforcement learning algorithms, and sometimes I need to broadcast a tf model or Tensor and it fails.

mitar · 2017-08-18T05:00:19Z

This is already possible with a helper function, see docs.

robertnishihara · 2017-08-18T18:57:17Z

@ProgrammerLWJ the underlying problem is that TensorFlow objects can't easily be serialized (e.g., with pickle). The preferable solution is to only ship the neural net weights or the gradient (e.g., as a list or dictionary of numpy arrays).

rueberger · 2018-05-17T00:30:13Z

This is a bit of a doozy, but I would really like to be able to use ray with the excellent sacred library - they belong together, like peanut butter and jelly. (I'm specifically referring ray tune here)

The sacred.Experiment object is a nightmare to serialize though. pickle fails to serialize it in its simplest incarnation and real-world experiments are usually significantly more complicated.

from sacred.experiment import Experiment
test_exp = Experiment('test')

import pickle
pickle.dumps(test_exp) # nope

import cloudpickle
cloudpickle.dumps(test_exp) # nope

import dill 
dill.dumps(test_exp) # also nope

Is there any hope for ever serializing experiment?

If not, there is an early stage discussion about an OO-based interface to sacred that might provide a path forward.

Update: I have managed to get tune running without having to serialize experiment and am now no longer sure that is required

JessicaSchrouff · 2019-02-01T16:23:02Z

We encountered many difficulties with serialization. #3917
In all cases, we wrote our own reduce method to pickle our objects for other purposes. Can we configure ray to be used with custom serialization?
Thank you!

robertnishihara · 2019-02-06T22:14:23Z

@JessicaSchrouff yes definitely, can you see if ray.register_custom_serializer does what you want? https://ray.readthedocs.io/en/latest/api.html#ray.register_custom_serializer

alaameloh · 2019-04-04T07:27:06Z

@pcmoritz
I'm having some trouble serializing spacy objects (nlp models) :
pickle cannot serialize them , but cloudpickle and dill do.
would you have a recommendation or should I go with a custom serialization function ?

Great work !

pcmoritz · 2019-04-05T04:38:49Z

@alaameloh Can you share an example that you would like to get working?

alaameloh · 2019-04-05T08:53:35Z

custom_model = spacy.load('path_to_model')
for doc in documents:
      result_ids.append(processing_function.remote(doc, custom_model))

here ray falls back to using pickle to serialize the spacy model. however , with the current spacy version i'm working with (2.0.16), pickle doesn't work (but cloud pickle does). it gets stuck afterwards and crashed after some time (due to memory i believe).

depending on the model, the loading time for spacy would be an inconvenience if i simply loaded the model inside the processing_function and executed it with every call.

robertnishihara · 2019-04-05T19:59:04Z

@alaameloh hm, when we say "falling back to pickle" I think we actually mean "cloudpickle". Can you provide a runnable code snippet that we can use to reproduce the issue?

alaameloh · 2019-04-08T12:39:46Z

@robertnishihara

import spacy
import ray

@ray.remote
def processing(nlp,text):
    processed = nlp(text)
    return processed

ray.init()
texts_list = ['this is just a random text to illustrate the serializing use case for ray',
              'Ray is a flexible, high-performance distributed execution framework.To launch a Ray cluster, \
              either privately, on AWS, or on GCP, follow these instructions.',
              'This document describes what Python objects Ray can and cannot serialize into the object store. \
              Once an object is placed in the object store, it is immutable.']

processed_id = []
nlp = spacy.load('en')
for text in texts_list:
    processed_id.append(processing.remote(nlp,text))

processed_list = ray.get(processed_id)

WARNING: Falling back to serializing objects of type <class 'pathlib.PosixPath'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.vocab.Vocab'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.tokenizer.Tokenizer'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.pipeline.DependencyParser'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.pipeline.EntityRecognizer'> by using pickle. This may be inefficient.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0408 14:25:37.543133 10242 node_manager.cc:245] Last heartbeat was sent 961 ms ago 
Traceback (most recent call last):

  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/remote_function.py", line 71, in remote
    return self._remote(args=args, kwargs=kwargs)
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/remote_function.py", line 121, in _remote
    resources=resources)
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 591, in submit_task
    args_for_local_scheduler.append(put(arg))
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 2233, in put
    worker.put_object(object_id, value)
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 359, in put_object
    self.store_and_register(object_id, value)
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 293, in store_and_register
    self.task_driver_id))
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/utils.py", line 437, in _wrapper
    return orig_attr(*args, **kwargs)
  File "pyarrow/_plasma.pyx", line 493, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/serialization.pxi", line 345, in pyarrow.lib.serialize
  File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowMemoryError: malloc of size 134217728 failed

1beb · 2019-05-25T08:34:55Z

Wanted to add a couple of notes here for problems that I am experiencing related to serialization.

Database connections or a pooled object cannot be passed, connections must be open/closed inside of the remote function.
Unit testing ray code is very challenging and I suspect it's due to serialization. Example:

import ray
ray.init()
def manager_fun_list(x_list, fun): 
    res = ray.get([manager_fun_one(x, fun) for x in x_list])
    return res

@ray.remote
def manager_fun_one(x, fun):
    return(fun(x))

Now let's test this code:

import unittest
from .dummy import manager_fun_list

class TestDummy(unittest.TestCase):
    x_list = [1,2,3,4,5]
    def myfun(self, x):
        return x*2

    def test_managed_myfun(self):
        manager_fun_list(self.x_list, self.myfun)

Running the test results in TypeError: Cannot serialize socket object. Likely because of the above discussed problem with self / super.

Now, here's something interesting, that I can't seem to get passed when trying to create unit tests for your own recommended solution (https://ray.readthedocs.io/en/latest/serialization.html#last-resort-workaround)

import ray
import pickle

ray.init()
def manager_fun_list(x_list, fun):
    fun = pickle.dumps(fun)
    res = ray.get([manager_fun_one.remote(x, fun) for x in x_list])
    return res

@ray.remote
def manager_fun_one(x, fun):
    fun = pickle.loads(fun)
    return(fun(x))

import unittest
from .dummy import manager_fun_list

def myfun(x):
    return x*2

class TestDummy(unittest.TestCase):
    x_list = [1,2,3,4,5]

    def test_managed_myfun(self):
        res = manager_fun_list(self.x_list, myfun)
        self.assertEqual(res, [2,4,5,8,10])

And the result... is.... ??? huh?

ERROR: test_managed_myfun (tests.test_manager_fun_list.TestDummy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspaces/.../.../tests/test_manager_fun_list.py", line 12, in test_managed_myfun
    res = manager_fun_list(self.x_list, myfun)
  File "/workspaces/.../.../dummy.py", line 7, in manager_fun_list
    res = ray.get([manager_fun_one.remote(x, fun) for x in x_list])
  File "/usr/local/lib/python3.7/site-packages/ray/worker.py", line 2197, in get
    raise value
ray.exceptions.RayTaskError: �[36mray_worker�[39m (pid=11502, host=662e97277f24)
  File "/workspaces/.../.../dummy.py", line 12, in manager_fun_one
    fun = pickle.loads(fun)
ModuleNotFoundError: No module named 'tests'

robertnishihara · 2020-01-24T07:19:09Z

This are all fixed I think.

durgeksh · 2023-01-20T07:21:58Z

I am sending an opened file as argument to the function. Trying to run that function as remote using ray but getting below errors.
signature is this:
id_1 = my_func.remote(checkFile)

Error:
File "/venv/lib/python3.9/site-packages/ray/remote_function.py", line 129, in _remote_proxy
return self._remote(args=args, kwargs=kwargs, self._default_options)
File "/venv/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 307, in _invocation_remote_span
return method(self, args, kwargs, *_args, _kwargs)
File "/venv/lib/python3.9/site-packages/ray/remote_function.py", line 412, in _remote
return invocation(args, kwargs)
File "/venv/lib/python3.9/site-packages/ray/remote_function.py", line 387, in invocation
object_refs = worker.core_worker.submit_task(
File "python/ray/_raylet.pyx", line 1896, in ray._raylet.CoreWorker.submit_task
File "python/ray/_raylet.pyx", line 1900, in ray._raylet.CoreWorker.submit_task
File "python/ray/_raylet.pyx", line 407, in ray._raylet.prepare_args_and_increment_put_refs
File "python/ray/_raylet.pyx", line 398, in ray._raylet.prepare_args_and_increment_put_refs
File "python/ray/_raylet.pyx", line 441, in ray._raylet.prepare_args_internal
File "/venv/lib/python3.9/site-packages/ray/_private/serialization.py", line 450, in serialize
return self._serialize_to_msgpack(value)
File "/venv/lib/python3.9/site-packages/ray/_private/serialization.py", line 428, in _serialize_to_msgpack
pickle5_serialized_object = self._serialize_to_pickle5(
File "/venv/lib/python3.9/site-packages/ray/_private/serialization.py", line 390, in _serialize_to_pickle5
raise e
File "/venv/lib/python3.9/site-packages/ray/_private/serialization.py", line 385, in _serialize_to_pickle5
inband = pickle.dumps(
File "/venv/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/venv/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 627, in dump
return Pickler.dump(self, obj)
File "*/venv/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 327, in _file_reduce
raise pickle.PicklingError(
_pickle.PicklingError: Cannot pickle files that are not opened for reading: w

pcmoritz added enhancement labels May 16, 2017

robertnishihara removed the enhancement label Sep 15, 2018

ericl removed the help wanted label Jun 20, 2019

robertnishihara closed this as completed Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray serialization problems #557

Ray serialization problems #557

pcmoritz commented May 16, 2017 •

edited

Loading

pcmoritz commented May 16, 2017

robertnishihara commented May 16, 2017

robertnishihara commented May 16, 2017

pcmoritz commented May 17, 2017

robertnishihara commented May 17, 2017

robertnishihara commented May 22, 2017

wesm commented May 22, 2017

robertnishihara commented Jul 19, 2017 •

edited

Loading

Wanjun0511 commented Aug 18, 2017

mitar commented Aug 18, 2017

robertnishihara commented Aug 18, 2017

rueberger commented May 17, 2018 •

edited

Loading

JessicaSchrouff commented Feb 1, 2019

robertnishihara commented Feb 6, 2019

alaameloh commented Apr 4, 2019

pcmoritz commented Apr 5, 2019

alaameloh commented Apr 5, 2019

robertnishihara commented Apr 5, 2019

alaameloh commented Apr 8, 2019

1beb commented May 25, 2019 •

edited

Loading

robertnishihara commented Jan 24, 2020

durgeksh commented Jan 20, 2023

Ray serialization problems #557

Ray serialization problems #557

Comments

pcmoritz commented May 16, 2017 • edited Loading

pcmoritz commented May 16, 2017

robertnishihara commented May 16, 2017

robertnishihara commented May 16, 2017

pcmoritz commented May 17, 2017

robertnishihara commented May 17, 2017

robertnishihara commented May 22, 2017

wesm commented May 22, 2017

robertnishihara commented Jul 19, 2017 • edited Loading

Wanjun0511 commented Aug 18, 2017

mitar commented Aug 18, 2017

robertnishihara commented Aug 18, 2017

rueberger commented May 17, 2018 • edited Loading

JessicaSchrouff commented Feb 1, 2019

robertnishihara commented Feb 6, 2019

alaameloh commented Apr 4, 2019

pcmoritz commented Apr 5, 2019

alaameloh commented Apr 5, 2019

robertnishihara commented Apr 5, 2019

alaameloh commented Apr 8, 2019

1beb commented May 25, 2019 • edited Loading

robertnishihara commented Jan 24, 2020

durgeksh commented Jan 20, 2023

pcmoritz commented May 16, 2017 •

edited

Loading

robertnishihara commented Jul 19, 2017 •

edited

Loading

rueberger commented May 17, 2018 •

edited

Loading

1beb commented May 25, 2019 •

edited

Loading