-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dask serializers for cuDF objects #4153
Conversation
In the event that serializing CUDA objects directly is not possible, this performs the next best thing, which is to serialize objects using Dask's serialization protocol. The protocol requires that data be on the host in 1-D contiguous `memoryviews`. So we perform serialization as we otherwise would. As a last step we perform a device-to-host transfer of the frames. Then we hand this off to Dask to serialize. When deserializing the data, all of the deserializers already work as frames are turned into `Buffer`s, which perform a host-to-device transfer if needed. This provides us an option that avoids pickling. As a result we are able to serialize things with Dask more efficiently using this protocol.
Codecov Report
@@ Coverage Diff @@
## branch-0.13 #4153 +/- ##
===============================================
- Coverage 86.74% 86.67% -0.08%
===============================================
Files 50 50
Lines 9810 9818 +8
===============================================
Hits 8510 8510
- Misses 1300 1308 +8 Continue to review full report at Codecov.
|
Would like someone more familiar with dask serialization dispatching to review as well before merging 😄 |
@quasiben, would you be able to take a look? 🙂 |
When would serializing cuda objects directly not be possible ? As I understand it, if we are transporting over TCP we will be calling pickled on the objects which should trigger a device to host copy. Has this now changed with the addition Buffer over numba ? |
Right this is the TCP case. So this would be useful for people who either lack UCX, or the hardware to really take advantage of it. This could come up on some cloud service providers or for our users that are not currently using UCX. Should also make for more realistic comparisons between UCX and TCP. Particularly as operations on the host Dask won't pickle in the first place. Also would be more realistic when comparing to other language implementations (Java, C++, etc.) that don't have this overhead. |
So currently the TCP case is not supported? Or it is but not performant/well understood ? |
It's supported by falling back to pickle, which adds unnecessary performance overhead. This avoids that overhead. |
Ah, that makes sense. Thanks @jakirkham |
Thanks for the reviews! 😄 |
Added PR ( dask/distributed#3478 ) to make sure we are registering this support with Dask. Things behave the same for cuDF versions lacking this support (in other words we fallback to pickle), but cuDF versions with this feature will perform more efficient serialization. |
In the event that serializing CUDA objects directly is not possible, this performs the next best thing, which is to serialize objects using Dask's serialization protocol. The protocol requires that data be on the host in 1-D contiguous
memoryviews
. So we perform serialization as we otherwise would. As a last step we perform a device-to-host transfer of the frames. Then we hand this off to Dask to serialize. When deserializing the data, all of the deserializers already work as frames are turned intoBuffer
s, which perform a host-to-device transfer if needed. This provides us an option that avoids pickling. As a result we are able to serialize things with Dask more efficiently using this protocol.