-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serializing CUDA objects in iterables #110
Comments
For now I think that your idea of implementing cuda_serialize on standard collections ( |
But doing the way you're suggesting, the case I mentioned at the beginning (tuple of cuDF dataframes) would be serialized using pickle, which is slow. Is that what you're saying should happen? Just to make it clear in case my initial comment wasn't, what I meant was a more complex handling that would serialize each of the elements of the tuple using the CUDA serializer, but it would be more complex and probably wouldn't solve for all possible object combinations (e.g., cuDF dataframes inside a tuple, inside another tuple). |
Ignore my previous comment, I had misunderstood your suggestion. It does make sense to handle it the way you're saying, I'll try doing that. |
This has been fixed in dask/distributed#2948 and #118, closing. |
We have improved serialization performance in #98, but it introduced an issue: serializing tuples (and potentially all other iterable types) with CUDA objects fails now. The issue is
distributed.protocol.serialize
will only serialize tuples when the serializer ispickle
, for instance,serialize((np_array1, np_array2), serializers=["dask"])
will also fail.I've found out about this issue when trying to run the sample from #57, more specifically, it happens during the
df = df.sort_values(['A','B','C'])
call, where it tries to serialize a tuple of(<cudf.DataFrame ncols=11 nrows=2759411 >, <cudf.DataFrame ncols=11 nrows=2759411 >)
.@mrocklin maybe you have an idea on how to solve this? I thought of checking for iterables on
device_to_host
and serializing each object individually, returning perhaps an iterable ofDeviceSerialized
objects, but then we could have tuples of tuples, and so on, that would make things more complicated, so it doesn't seem like a resilient solution. Also, I'm not sure if the solution should be here or in dask-distributed, or maybe there's one already that I just don't know of.The text was updated successfully, but these errors were encountered: