-
-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling serialization registration externally #3831
Comments
Maybe this can be done with entrypoints like what is being done with sizeof ( dask/dask#7647 ) ( dask/dask#7688 ) |
Agreed that entrypoints would be well suited for this. For reference, |
Right this would be for custom object serialization that comes up occasionally. For example I think ITK ran into this issue not long ago. Periodically users come by with their own custom objects that need serialization and don't necessarily know the right way to hook in. This was also an issue for RAPIDS early on and we would likely have used such a system if it were available. IOW very much the same conversation with sizeof |
Yeah, that all makes sense. I was mostly just pointing out the existing comms entrypoint to say we had a similar need elsewhere in |
I don't get what's missing, yet. Why is |
It’s needed because some frames need to stay on device, which means they don’t play well with Blosc and cannot be coerced to Yes users can register their own objects for serialization. The issue is Dask won’t know whether they can be serialized unless they register their serializers in Distributed. Or alternatively manually load them on all workers with a preload script or |
Sorry if I'm being a bit complicated but I'm still having trouble to understand since the way I understand this issue, this is already possible. I see that there are many types registered in distributed but I don't really understand why. Can't the libraries themselves register this? Let's take the example for the cudf below distributed/distributed/protocol/__init__.py Lines 104 to 109 in 43ad2f8
I don't really understand why this must be part of
Is this about a dynamic registration? Such that when I register a (de)serializer, the client replicates this code to the cluster? |
Well that's the point of this issue. To make this not a requirement. Even if we
Not exactly. Dask makes sure this is always Dask isn't doing anything magical. Just that |
It may be worthwhile to just play with an example. Issue ( #4562 ) shows a user trying to do custom serialization with their own object and the issue they encounter |
The import order was the missing piece to (my/the) puzzle. I understand now, thanks! |
Currently custom serialization registration must be done within Distributed itself. However this is a bit tricky to manage (particularly for newcomers).
First they need to implement serialization for their libraries objects like so. Second they need to add code to distributed to register their serializers. Third they need to test this somehow using a development install of their library and Distributed.
It would be useful if this registration (second) step could be done in external libraries. This would allow all of the changes in one place without requiring some coordinations of changes, PRs, and releases so as not to break anything.
The text was updated successfully, but these errors were encountered: