Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling serialization registration externally #3831

Open
jakirkham opened this issue May 27, 2020 · 10 comments
Open

Handling serialization registration externally #3831

jakirkham opened this issue May 27, 2020 · 10 comments
Labels
enhancement Improve existing functionality or make things work better help wanted

Comments

@jakirkham
Copy link
Member

Currently custom serialization registration must be done within Distributed itself. However this is a bit tricky to manage (particularly for newcomers).

First they need to implement serialization for their libraries objects like so. Second they need to add code to distributed to register their serializers. Third they need to test this somehow using a development install of their library and Distributed.

It would be useful if this registration (second) step could be done in external libraries. This would allow all of the changes in one place without requiring some coordinations of changes, PRs, and releases so as not to break anything.

@jakirkham
Copy link
Member Author

Maybe this can be done with entrypoints like what is being done with sizeof ( dask/dask#7647 ) ( dask/dask#7688 )

@jrbourbeau
Copy link
Member

Agreed that entrypoints would be well suited for this. For reference, distributed already supports entrypoints for registering third-party communication backends https://distributed.dask.org/en/latest/communications.html#extending-the-communication-layer

@jakirkham
Copy link
Member Author

Right this would be for custom object serialization that comes up occasionally. For example I think ITK ran into this issue not long ago. Periodically users come by with their own custom objects that need serialization and don't necessarily know the right way to hook in. This was also an issue for RAPIDS early on and we would likely have used such a system if it were available. IOW very much the same conversation with sizeof

@jrbourbeau
Copy link
Member

Yeah, that all makes sense. I was mostly just pointing out the existing comms entrypoint to say we had a similar need elsewhere in distributed (for third-party library authors to register some custom code) and added entrypoint support, which has, I think, been successful. We should repeat that process for serialization and sizeof registration

@fjetter
Copy link
Member

fjetter commented May 25, 2021

I don't get what's missing, yet. Why is cuda_serialize even a dedicated function? With the dispatch in distributed.protocol.serialize.dask_serialize (maybe there is an easier import) we can already register serializers for custom classes. is this a comm-specific registration or am I missing something more fundamental?

@jakirkham
Copy link
Member Author

It’s needed because some frames need to stay on device, which means they don’t play well with Blosc and cannot be coerced to memoryviews. Dask serialization is only used with device objects that need to be copied to host. Though that’s not really the subject of this issue

Yes users can register their own objects for serialization. The issue is Dask won’t know whether they can be serialized unless they register their serializers in Distributed. Or alternatively manually load them on all workers with a preload script or Client.run, both of which are kind of hacks. If Dask doesn’t know that these custom objects can be serialized, Workers will either try pickle or crash if that’s not an option. So the goal would be to make this registration easier on users

@fjetter
Copy link
Member

fjetter commented May 25, 2021

Sorry if I'm being a bit complicated but I'm still having trouble to understand since the way I understand this issue, this is already possible.

I see that there are many types registered in distributed but I don't really understand why. Can't the libraries themselves register this?

Let's take the example for the cudf below

@cuda_serialize.register_lazy("cudf")
@cuda_deserialize.register_lazy("cudf")
@dask_serialize.register_lazy("cudf")
@dask_deserialize.register_lazy("cudf")
def _register_cudf():
from cudf.comm import serialize

I don't really understand why this must be part of distributed. Couldn't this just be part of the cudf package? Why is the package relevant for this? The dispatch is handled by dask.utils.Dispatch which should allow registration outside of distributed during runtime.

Or alternatively manually load them on all workers with a preload script or Client.run, both of which are kind of hacks.

Is this about a dynamic registration? Such that when I register a (de)serializer, the client replicates this code to the cluster?

@jakirkham
Copy link
Member Author

I don't really understand why this must be part of distributed. Couldn't this just be part of the cudf package? Why is the package relevant for this? The dispatch is handled by dask.utils.Dispatch which should allow registration outside of distributed during runtime.

Well that's the point of this issue. To make this not a requirement.

Even if we import this on the Client and send it to the Worker, the Worker won't know what to do with the message unless cudf was already imported. It will look for a way to deserialize and not find one. So will raise an error instead.

Or alternatively manually load them on all workers with a preload script or Client.run, both of which are kind of hacks.

Is this about a dynamic registration? Such that when I register a (de)serializer, the client replicates this code to the cluster?

Not exactly. Dask makes sure this is always imported when to_frames/from_frames are. This is what makes things work currently.

Dask isn't doing anything magical. Just that import.

@jakirkham
Copy link
Member Author

It may be worthwhile to just play with an example. Issue ( #4562 ) shows a user trying to do custom serialization with their own object and the issue they encounter

@fjetter
Copy link
Member

fjetter commented May 25, 2021

the Worker won't know what to do with the message unless cudf was already imported.

The import order was the missing piece to (my/the) puzzle. I understand now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve existing functionality or make things work better help wanted
Projects
None yet
Development

No branches or pull requests

4 participants