Increase flexibility around Python versions and compression #4011

mrocklin · 2020-08-03T20:23:32Z

Currently Dask fails to communicate when we try to cross Python versions or have mismatched compression. The Python version mismatch is new-ish ever since we added Pickle protocol 5, the compression is a long-standing issue. In principle we ask that users provide consistent versions across all machines, however we may still be able to accomodate things here with a little bit of work on our end.

When establishing a connection between two endpoints we might have the two sides publish their capabilities. This might inform the other side which compression libraries they have available, whether or not they support Pickle5, or whether or not they have a GPU. This might make communication in heterogeneous situations a bit easier.

We already have some work here. There is a context= keyword that is passed down from Comm.write to to_frames, which in turn handles serialization. We could start populating that with more information and using it in serialization/compression.

This would allow the following situations:

More relaxation around Python versions
More relaxation around compression libraries
Downgrading GPU results smoothly to CPU clients

cc @quasiben @jakirkham

The text was updated successfully, but these errors were encountered:

jakirkham · 2020-08-04T00:32:15Z

Messaging capabilities makes sense. Though I would think we would want to cache these somewhere (just to avoid sending this info repeatedly).

As all GPU things implement "cuda" and "dask" serialization, it should be simple matter to serialize them with either as appropriate. This is how things like TCP and spilling work today.

Though currently compression is not that configurable (or at least if it is it is not clear to me how to do this). How would you envision compression being handled here?

To pickle protocol 5, I wonder if we shouldn't just give users a hook to enable or disable this ( #4012 ). Seems pretty lightweight.

jakirkham · 2020-08-04T00:34:18Z

That said, one thing I'm curious about is in what cases you envision sending GPU objects through a comm to a worker that lacks a GPU. In particular what would that worker do with that message? Or am I misunderstanding something here?

mrocklin · 2020-08-04T01:37:39Z

Messaging capabilities makes sense. Though I would think we would want to cache these somewhere (just to avoid sending this info repeatedly).

Agreed, I think that we would send this information when first connecting.

That said, one thing I'm curious about is in what cases you envision sending GPU objects through a comm to a worker that lacks a GPU. In particular what would that worker do with that message? Or am I misunderstanding something here?

My main motivation is to have graceful reduction in performance when Python or LZ4 are mismatched. GPU stuff is secondary for me, but maybe also interesting. One situation where this comes up is if I'm executing computations on a remote GPU cluster from a non-GPU machine (like a laptop). In that situation I may want x.compute() to give me back a Numpy array, even though the result is actually a CuPy array for example. This would require participation by the serialization functions though (they can accept a context= keyword today I think).

Though currently compression is not that configurable (or at least if it is it is not clear to me how to do this). How would you envision compression being handled here?

One before calling maybe_compress one would check the recipient side to make sure that it had the compression you were going to use. If it didn't then you might go down a list until you found something that worked.

jakirkham · 2020-08-05T18:12:07Z

Thanks for fleshing out those ideas a bit. Generally that makes sense.

I think when it comes to the CPU/GPU bit I don't think we have enough of the scaffolding to approach that today. Not to say it shouldn't be done, just it may require a bit more thought and work first. For example "dask" serialization as implemented today is just a delta away from "cuda" serialization in that it just moves all frames to host. This ends up being handy for TCP (when UCX is not an option), spilling, etc.. So it's not really designed to allow CPU deserialization of an object. Though we could imagine a new mechanism where we ask workers to move an object to host prior to serialization, which might allow this kind of interaction. Can certainly appreciate how this might be useful :)

mrocklin · 2020-08-05T18:38:08Z

For GPU-CPU serialization I'm thinking of the following:

--- a/distributed/protocol/cupy.py
+++ b/distributed/protocol/cupy.py
@@ -15,7 +15,10 @@ except ImportError:
 
 
 @cuda_serialize.register(cupy.ndarray)
-def cuda_serialize_cupy_ndarray(x):
+def cuda_serialize_cupy_ndarray(x, context=None):
+    if context and not context["recipient"].get("GPU"):
+        return serialize(x.get())
+
     # Making sure `x` is behaving
     if not (x.flags["C_CONTIGUOUS"] or x.flags["F_CONTIGUOUS"]):
         x = cupy.array(x, copy=True)

If for some reason we've chosen to move a cupy array to a machine without a GPU, let's convert it to a Numpy array first. I'm not saying that we should do this, merely that it's easy to do if we want to.

jakirkham · 2020-08-05T18:44:15Z

Yeah I'm not sure why that needs to live in the CuPy serializer though. Would be much cleaner if it was its own higher level API through a Dispatch object.

Independently we probably would want to discuss how this is being used to understand how to design it. Certainly one case is fetching the result. What other use cases are there?

mrocklin · 2020-08-05T19:01:13Z

Yeah I'm not sure why that needs to live in the CuPy serializer though.

It seems like the simplest place to put it to me.

Independently we probably would want to discuss how this is being used to understand how to design it. Certainly one case is fetching the result. What other use cases are there?

That's the main one that comes to my mind and that I deal with.

jakirkham · 2020-08-05T19:35:29Z

Well the other thing would be the meta, but I don't know if the CPU implementations of these things are a good enough proxy for the GPU ones. They might be, but there may be subtle differences. IDK if we are getting off topic, happy to drop it if so.

jakirkham · 2020-09-09T05:07:08Z

Is this solved with PR ( #4019 ) or is there more still to do here? If the latter, it might be worth briefly listing what is still needed.

jakirkham mentioned this issue Aug 3, 2020

Allow pickle protocol to be overridden #4012

Closed

jakirkham mentioned this issue Aug 5, 2020

Add compression, pickle protocol to comm contexts #4019

Merged

mrocklin closed this as completed Sep 9, 2020

hendrikmakait mentioned this issue Sep 7, 2022

SystemError: unknown opcode with SchedulerPlugin when client uses Python3.10 and scheduler Python3.9 #7016

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase flexibility around Python versions and compression #4011

Increase flexibility around Python versions and compression #4011

mrocklin commented Aug 3, 2020

jakirkham commented Aug 4, 2020

jakirkham commented Aug 4, 2020

mrocklin commented Aug 4, 2020

jakirkham commented Aug 5, 2020

mrocklin commented Aug 5, 2020

jakirkham commented Aug 5, 2020

mrocklin commented Aug 5, 2020

jakirkham commented Aug 5, 2020

jakirkham commented Sep 9, 2020

Increase flexibility around Python versions and compression #4011

Increase flexibility around Python versions and compression #4011

Comments

mrocklin commented Aug 3, 2020

jakirkham commented Aug 4, 2020

jakirkham commented Aug 4, 2020

mrocklin commented Aug 4, 2020

jakirkham commented Aug 5, 2020

mrocklin commented Aug 5, 2020

jakirkham commented Aug 5, 2020

mrocklin commented Aug 5, 2020

jakirkham commented Aug 5, 2020

jakirkham commented Sep 9, 2020