Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase flexibility around Python versions and compression #4011

Closed
mrocklin opened this issue Aug 3, 2020 · 9 comments
Closed

Increase flexibility around Python versions and compression #4011

mrocklin opened this issue Aug 3, 2020 · 9 comments

Comments

@mrocklin
Copy link
Member

mrocklin commented Aug 3, 2020

Currently Dask fails to communicate when we try to cross Python versions or have mismatched compression. The Python version mismatch is new-ish ever since we added Pickle protocol 5, the compression is a long-standing issue. In principle we ask that users provide consistent versions across all machines, however we may still be able to accomodate things here with a little bit of work on our end.

When establishing a connection between two endpoints we might have the two sides publish their capabilities. This might inform the other side which compression libraries they have available, whether or not they support Pickle5, or whether or not they have a GPU. This might make communication in heterogeneous situations a bit easier.

We already have some work here. There is a context= keyword that is passed down from Comm.write to to_frames, which in turn handles serialization. We could start populating that with more information and using it in serialization/compression.

This would allow the following situations:

  1. More relaxation around Python versions
  2. More relaxation around compression libraries
  3. Downgrading GPU results smoothly to CPU clients

cc @quasiben @jakirkham

@jakirkham
Copy link
Member

Messaging capabilities makes sense. Though I would think we would want to cache these somewhere (just to avoid sending this info repeatedly).

As all GPU things implement "cuda" and "dask" serialization, it should be simple matter to serialize them with either as appropriate. This is how things like TCP and spilling work today.

Though currently compression is not that configurable (or at least if it is it is not clear to me how to do this). How would you envision compression being handled here?

To pickle protocol 5, I wonder if we shouldn't just give users a hook to enable or disable this ( #4012 ). Seems pretty lightweight.

@jakirkham
Copy link
Member

That said, one thing I'm curious about is in what cases you envision sending GPU objects through a comm to a worker that lacks a GPU. In particular what would that worker do with that message? Or am I misunderstanding something here?

@mrocklin
Copy link
Member Author

mrocklin commented Aug 4, 2020

Messaging capabilities makes sense. Though I would think we would want to cache these somewhere (just to avoid sending this info repeatedly).

Agreed, I think that we would send this information when first connecting.

That said, one thing I'm curious about is in what cases you envision sending GPU objects through a comm to a worker that lacks a GPU. In particular what would that worker do with that message? Or am I misunderstanding something here?

My main motivation is to have graceful reduction in performance when Python or LZ4 are mismatched. GPU stuff is secondary for me, but maybe also interesting. One situation where this comes up is if I'm executing computations on a remote GPU cluster from a non-GPU machine (like a laptop). In that situation I may want x.compute() to give me back a Numpy array, even though the result is actually a CuPy array for example. This would require participation by the serialization functions though (they can accept a context= keyword today I think).

Though currently compression is not that configurable (or at least if it is it is not clear to me how to do this). How would you envision compression being handled here?

One before calling maybe_compress one would check the recipient side to make sure that it had the compression you were going to use. If it didn't then you might go down a list until you found something that worked.

@jakirkham
Copy link
Member

Thanks for fleshing out those ideas a bit. Generally that makes sense.

I think when it comes to the CPU/GPU bit I don't think we have enough of the scaffolding to approach that today. Not to say it shouldn't be done, just it may require a bit more thought and work first. For example "dask" serialization as implemented today is just a delta away from "cuda" serialization in that it just moves all frames to host. This ends up being handy for TCP (when UCX is not an option), spilling, etc.. So it's not really designed to allow CPU deserialization of an object. Though we could imagine a new mechanism where we ask workers to move an object to host prior to serialization, which might allow this kind of interaction. Can certainly appreciate how this might be useful :)

@mrocklin
Copy link
Member Author

mrocklin commented Aug 5, 2020

For GPU-CPU serialization I'm thinking of the following:

--- a/distributed/protocol/cupy.py
+++ b/distributed/protocol/cupy.py
@@ -15,7 +15,10 @@ except ImportError:
 
 
 @cuda_serialize.register(cupy.ndarray)
-def cuda_serialize_cupy_ndarray(x):
+def cuda_serialize_cupy_ndarray(x, context=None):
+    if context and not context["recipient"].get("GPU"):
+        return serialize(x.get())
+
     # Making sure `x` is behaving
     if not (x.flags["C_CONTIGUOUS"] or x.flags["F_CONTIGUOUS"]):
         x = cupy.array(x, copy=True)

If for some reason we've chosen to move a cupy array to a machine without a GPU, let's convert it to a Numpy array first. I'm not saying that we should do this, merely that it's easy to do if we want to.

@jakirkham
Copy link
Member

Yeah I'm not sure why that needs to live in the CuPy serializer though. Would be much cleaner if it was its own higher level API through a Dispatch object.

Independently we probably would want to discuss how this is being used to understand how to design it. Certainly one case is fetching the result. What other use cases are there?

@mrocklin
Copy link
Member Author

mrocklin commented Aug 5, 2020

Yeah I'm not sure why that needs to live in the CuPy serializer though.

It seems like the simplest place to put it to me.

Independently we probably would want to discuss how this is being used to understand how to design it. Certainly one case is fetching the result. What other use cases are there?

That's the main one that comes to my mind and that I deal with.

@jakirkham
Copy link
Member

Well the other thing would be the meta, but I don't know if the CPU implementations of these things are a good enough proxy for the GPU ones. They might be, but there may be subtle differences. IDK if we are getting off topic, happy to drop it if so.

@jakirkham
Copy link
Member

Is this solved with PR ( #4019 ) or is there more still to do here? If the latter, it might be worth briefly listing what is still needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants