Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray serialization problems #557

Closed
pcmoritz opened this issue May 16, 2017 · 22 comments
Closed

Ray serialization problems #557

pcmoritz opened this issue May 16, 2017 · 22 comments

Comments

@pcmoritz
Copy link
Contributor

pcmoritz commented May 16, 2017

We should try to understand better what we can serialize and what we cannot, so I'm making this issue to collect problems with our serialization.

There are five categories of problems:

  1. Things that not even python pickle can serialize. These are mostly out of scope for us.
  2. Things that python pickle can serialize but cloudpickle cannot. We'd like to be aware of these and potentially fix them or report them upstream.
  3. Things that cloudpickle can serialize but we cannot. We'd like to be aware of these and fix them.
  4. Things we we serialize with arrow but deserialization is incorrect. We'd like to fix these.
  5. Things where serialization is slower than expected. We'd like to know about these problems and potentially fix them.

If you run into problems like the above, please post a minimal example here or create a new issue for it.

@pcmoritz
Copy link
Contributor Author

Here is one that fits into 3.:

import re
b = re.compile(r"\d+\.\d*")

@robertnishihara
Copy link
Collaborator

We actually can handle this case. If you try ray.get(ray.put(b)), it works. The thing that is failing when you pass b into a remote function is something else. I'm looking into it.

@robertnishihara
Copy link
Collaborator

Ok, the issue here is that pickle.dumps(b) succeeds, but pickle.dumps(type(b)) fails, so we can't do the usual register_class approach because that involves pickling the type.

That said, it seems like we shouldn't actually need to do register_class if we're pickling things.

@pcmoritz
Copy link
Contributor Author

more resources: https://github.com/jsonpickle/jsonpickle/tree/master/tests and https://github.com/jsonpickle/jsonpickle/issues

Ideally we would port all the tests to Ray.

@robertnishihara
Copy link
Collaborator

Is category (2) a real thing? Cloudpickle is supposed to be more general. It is true, that pickle "succeeds" at serializing some things that cloudpickle fails at, e.g.,

class Foo(object):
  def __init__(self):
    super

cloudpickle.dumps(Foo)  # PicklingError: Could not pickle object as excessively deep recursion required.
pickle.dumps(Foo)  # b'\x80\x03c__main__\nFoo\nq\x00.'

However, this is misleading. Pickle only succeeds because it doesn't capture the class definition (you couldn't unpickle it in another process).

After #550, I think category (3) should more or less be nonexistent. I'm probably wrong about this, but we'll see.

Category (4) is a big problem. E.g., #319.

@robertnishihara
Copy link
Collaborator

I just remembered an important class of types that fall in category (3), which is subtypes of standard types like lists/dicts/tuples (see #512 for an example).

For example, if we serialize a collections.defaultdict, then it will be deserialized as dictionary. This presumably happens in a lot of types.

@wesm
Copy link

wesm commented May 22, 2017

I opened https://issues.apache.org/jira/browse/ARROW-1059 with the goal of being able to more easily expand the kinds of objects that can be serialized using the Arrow IPC machinery. So if you had objects that don't fit the set of "fast storage" types (tensors and the other primitive types supported), then these could be stored as pickles. pyarrow could register a pickle deserializer to be able to understand this user "message" type

@robertnishihara
Copy link
Collaborator

robertnishihara commented Jul 19, 2017

Ray currently fails to serialize xarray objects #748 (this is category (3) above).

@Wanjun0511
Copy link

Ray also fails to serialize tensorflow-related objects, like Tensor.
Do you have plan to fix it?
I'm working on Reinforcement learning algorithms, and sometimes I need to broadcast a tf model or Tensor and it fails.

@mitar
Copy link
Member

mitar commented Aug 18, 2017

This is already possible with a helper function, see docs.

@robertnishihara
Copy link
Collaborator

@ProgrammerLWJ the underlying problem is that TensorFlow objects can't easily be serialized (e.g., with pickle). The preferable solution is to only ship the neural net weights or the gradient (e.g., as a list or dictionary of numpy arrays).

@rueberger
Copy link
Contributor

rueberger commented May 17, 2018

This is a bit of a doozy, but I would really like to be able to use ray with the excellent sacred library - they belong together, like peanut butter and jelly. (I'm specifically referring ray tune here)

The sacred.Experiment object is a nightmare to serialize though. pickle fails to serialize it in its simplest incarnation and real-world experiments are usually significantly more complicated.

from sacred.experiment import Experiment
test_exp = Experiment('test')

import pickle
pickle.dumps(test_exp) # nope

import cloudpickle
cloudpickle.dumps(test_exp) # nope

import dill 
dill.dumps(test_exp) # also nope

Is there any hope for ever serializing experiment?

If not, there is an early stage discussion about an OO-based interface to sacred that might provide a path forward.

Update: I have managed to get tune running without having to serialize experiment and am now no longer sure that is required

@JessicaSchrouff
Copy link

We encountered many difficulties with serialization. #3917
In all cases, we wrote our own reduce method to pickle our objects for other purposes. Can we configure ray to be used with custom serialization?
Thank you!

@robertnishihara
Copy link
Collaborator

@JessicaSchrouff yes definitely, can you see if ray.register_custom_serializer does what you want? https://ray.readthedocs.io/en/latest/api.html#ray.register_custom_serializer

@alaameloh
Copy link

@pcmoritz
I'm having some trouble serializing spacy objects (nlp models) :
pickle cannot serialize them , but cloudpickle and dill do.
would you have a recommendation or should I go with a custom serialization function ?

Great work !

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Apr 5, 2019

@alaameloh Can you share an example that you would like to get working?

@alaameloh
Copy link

custom_model = spacy.load('path_to_model')
for doc in documents:
      result_ids.append(processing_function.remote(doc, custom_model))

here ray falls back to using pickle to serialize the spacy model. however , with the current spacy version i'm working with (2.0.16), pickle doesn't work (but cloud pickle does). it gets stuck afterwards and crashed after some time (due to memory i believe).

depending on the model, the loading time for spacy would be an inconvenience if i simply loaded the model inside the processing_function and executed it with every call.

@robertnishihara
Copy link
Collaborator

@alaameloh hm, when we say "falling back to pickle" I think we actually mean "cloudpickle". Can you provide a runnable code snippet that we can use to reproduce the issue?

@alaameloh
Copy link

@robertnishihara

import spacy
import ray

@ray.remote
def processing(nlp,text):
    processed = nlp(text)
    return processed

ray.init()
texts_list = ['this is just a random text to illustrate the serializing use case for ray',
              'Ray is a flexible, high-performance distributed execution framework.To launch a Ray cluster, \
              either privately, on AWS, or on GCP, follow these instructions.',
              'This document describes what Python objects Ray can and cannot serialize into the object store. \
              Once an object is placed in the object store, it is immutable.']

processed_id = []
nlp = spacy.load('en')
for text in texts_list:
    processed_id.append(processing.remote(nlp,text))

processed_list = ray.get(processed_id)
WARNING: Falling back to serializing objects of type <class 'pathlib.PosixPath'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.vocab.Vocab'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.tokenizer.Tokenizer'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.pipeline.DependencyParser'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.pipeline.EntityRecognizer'> by using pickle. This may be inefficient.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0408 14:25:37.543133 10242 node_manager.cc:245] Last heartbeat was sent 961 ms ago 
Traceback (most recent call last):

  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/remote_function.py", line 71, in remote
    return self._remote(args=args, kwargs=kwargs)
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/remote_function.py", line 121, in _remote
    resources=resources)
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 591, in submit_task
    args_for_local_scheduler.append(put(arg))
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 2233, in put
    worker.put_object(object_id, value)
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 359, in put_object
    self.store_and_register(object_id, value)
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 293, in store_and_register
    self.task_driver_id))
  File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/utils.py", line 437, in _wrapper
    return orig_attr(*args, **kwargs)
  File "pyarrow/_plasma.pyx", line 493, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/serialization.pxi", line 345, in pyarrow.lib.serialize
  File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowMemoryError: malloc of size 134217728 failed

@1beb
Copy link
Contributor

1beb commented May 25, 2019

Wanted to add a couple of notes here for problems that I am experiencing related to serialization.

  1. Database connections or a pooled object cannot be passed, connections must be open/closed inside of the remote function.
  2. Unit testing ray code is very challenging and I suspect it's due to serialization. Example:
import ray
ray.init()
def manager_fun_list(x_list, fun): 
    res = ray.get([manager_fun_one(x, fun) for x in x_list])
    return res

@ray.remote
def manager_fun_one(x, fun):
    return(fun(x))

Now let's test this code:

import unittest
from .dummy import manager_fun_list

class TestDummy(unittest.TestCase):
    x_list = [1,2,3,4,5]
    def myfun(self, x):
        return x*2

    def test_managed_myfun(self):
        manager_fun_list(self.x_list, self.myfun)

Running the test results in TypeError: Cannot serialize socket object. Likely because of the above discussed problem with self / super.

Now, here's something interesting, that I can't seem to get passed when trying to create unit tests for your own recommended solution (https://ray.readthedocs.io/en/latest/serialization.html#last-resort-workaround)

import ray
import pickle

ray.init()
def manager_fun_list(x_list, fun):
    fun = pickle.dumps(fun)
    res = ray.get([manager_fun_one.remote(x, fun) for x in x_list])
    return res

@ray.remote
def manager_fun_one(x, fun):
    fun = pickle.loads(fun)
    return(fun(x))
import unittest
from .dummy import manager_fun_list

def myfun(x):
    return x*2

class TestDummy(unittest.TestCase):
    x_list = [1,2,3,4,5]

    def test_managed_myfun(self):
        res = manager_fun_list(self.x_list, myfun)
        self.assertEqual(res, [2,4,5,8,10])

And the result... is.... ??? huh?

ERROR: test_managed_myfun (tests.test_manager_fun_list.TestDummy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspaces/.../.../tests/test_manager_fun_list.py", line 12, in test_managed_myfun
    res = manager_fun_list(self.x_list, myfun)
  File "/workspaces/.../.../dummy.py", line 7, in manager_fun_list
    res = ray.get([manager_fun_one.remote(x, fun) for x in x_list])
  File "/usr/local/lib/python3.7/site-packages/ray/worker.py", line 2197, in get
    raise value
ray.exceptions.RayTaskError: �[36mray_worker�[39m (pid=11502, host=662e97277f24)
  File "/workspaces/.../.../dummy.py", line 12, in manager_fun_one
    fun = pickle.loads(fun)
ModuleNotFoundError: No module named 'tests'

@ericl ericl removed the help wanted label Jun 20, 2019
@robertnishihara
Copy link
Collaborator

This are all fixed I think.

@durgeksh
Copy link

I am sending an opened file as argument to the function. Trying to run that function as remote using ray but getting below errors.
signature is this:
id_1 = my_func.remote(checkFile)

Error:
File "/venv/lib/python3.9/site-packages/ray/remote_function.py", line 129, in _remote_proxy
return self._remote(args=args, kwargs=kwargs, self._default_options)
File "
/venv/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 307, in _invocation_remote_span
return method(self, args, kwargs, *_args, _kwargs)
File "
/venv/lib/python3.9/site-packages/ray/remote_function.py", line 412, in _remote
return invocation(args, kwargs)
File "
/venv/lib/python3.9/site-packages/ray/remote_function.py", line 387, in invocation
object_refs = worker.core_worker.submit_task(
File "python/ray/_raylet.pyx", line 1896, in ray._raylet.CoreWorker.submit_task
File "python/ray/_raylet.pyx", line 1900, in ray._raylet.CoreWorker.submit_task
File "python/ray/_raylet.pyx", line 407, in ray._raylet.prepare_args_and_increment_put_refs
File "python/ray/_raylet.pyx", line 398, in ray._raylet.prepare_args_and_increment_put_refs
File "python/ray/_raylet.pyx", line 441, in ray._raylet.prepare_args_internal
File "
/venv/lib/python3.9/site-packages/ray/_private/serialization.py", line 450, in serialize
return self._serialize_to_msgpack(value)
File "
/venv/lib/python3.9/site-packages/ray/_private/serialization.py", line 428, in _serialize_to_msgpack
pickle5_serialized_object = self._serialize_to_pickle5(
File "
/venv/lib/python3.9/site-packages/ray/_private/serialization.py", line 390, in _serialize_to_pickle5
raise e
File "
/venv/lib/python3.9/site-packages/ray/_private/serialization.py", line 385, in _serialize_to_pickle5
inband = pickle.dumps(
File "
/venv/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "
/venv/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 627, in dump
return Pickler.dump(self, obj)
File "
*/venv/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 327, in _file_reduce
raise pickle.PicklingError(
_pickle.PicklingError: Cannot pickle files that are not opened for reading: w

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests