Add Dask serializers for cuDF objects #4153

jakirkham · 2020-02-14T02:28:26Z

In the event that serializing CUDA objects directly is not possible, this performs the next best thing, which is to serialize objects using Dask's serialization protocol. The protocol requires that data be on the host in 1-D contiguous memoryviews. So we perform serialization as we otherwise would. As a last step we perform a device-to-host transfer of the frames. Then we hand this off to Dask to serialize. When deserializing the data, all of the deserializers already work as frames are turned into Buffers, which perform a host-to-device transfer if needed. This provides us an option that avoids pickling. As a result we are able to serialize things with Dask more efficiently using this protocol.

In the event that serializing CUDA objects directly is not possible, this performs the next best thing, which is to serialize objects using Dask's serialization protocol. The protocol requires that data be on the host in 1-D contiguous `memoryviews`. So we perform serialization as we otherwise would. As a last step we perform a device-to-host transfer of the frames. Then we hand this off to Dask to serialize. When deserializing the data, all of the deserializers already work as frames are turned into `Buffer`s, which perform a host-to-device transfer if needed. This provides us an option that avoids pickling. As a result we are able to serialize things with Dask more efficiently using this protocol.

codecov · 2020-02-14T04:34:41Z

Codecov Report

Merging #4153 into branch-0.13 will decrease coverage by 0.07%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##           branch-0.13    #4153      +/-   ##
===============================================
- Coverage        86.74%   86.67%   -0.08%     
===============================================
  Files               50       50              
  Lines             9810     9818       +8     
===============================================
  Hits              8510     8510              
- Misses            1300     1308       +8

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85983b0...e4b488c. Read the comment docs.

kkraus14 · 2020-02-14T05:18:32Z

Would like someone more familiar with dask serialization dispatching to review as well before merging 😄

jakirkham · 2020-02-14T05:47:58Z

@quasiben, would you be able to take a look? 🙂

quasiben · 2020-02-14T16:56:17Z

When would serializing cuda objects directly not be possible ? As I understand it, if we are transporting over TCP we will be calling pickled on the objects which should trigger a device to host copy. Has this now changed with the addition Buffer over numba ?

jakirkham · 2020-02-14T17:14:41Z

Right this is the TCP case.

So this would be useful for people who either lack UCX, or the hardware to really take advantage of it. This could come up on some cloud service providers or for our users that are not currently using UCX.

Should also make for more realistic comparisons between UCX and TCP. Particularly as operations on the host Dask won't pickle in the first place. Also would be more realistic when comparing to other language implementations (Java, C++, etc.) that don't have this overhead.

quasiben · 2020-02-14T17:27:21Z

So currently the TCP case is not supported? Or it is but not performant/well understood ?

jakirkham · 2020-02-14T17:37:02Z

It's supported by falling back to pickle, which adds unnecessary performance overhead. This avoids that overhead.

quasiben · 2020-02-14T17:51:01Z

Ah, that makes sense. Thanks @jakirkham

jakirkham · 2020-02-14T18:06:50Z

Thanks for the reviews! 😄

jakirkham · 2020-02-14T19:36:06Z

Added PR ( dask/distributed#3478 ) to make sure we are registering this support with Dask. Things behave the same for cuDF versions lacking this support (in other words we fallback to pickle), but cuDF versions with this feature will perform more efficient serialization.

jakirkham requested a review from a team as a code owner February 14, 2020 02:28

jakirkham added 4 - Needs cuDF (Python) Reviewer dask Dask issue labels Feb 14, 2020

kkraus14 approved these changes Feb 14, 2020

View reviewed changes

kkraus14 removed the 4 - Needs cuDF (Python) Reviewer label Feb 14, 2020

quasiben approved these changes Feb 14, 2020

View reviewed changes

quasiben merged commit 16ebc35 into rapidsai:branch-0.13 Feb 14, 2020

jakirkham deleted the add_dask_serializers branch February 14, 2020 18:05

jakirkham mentioned this pull request Feb 14, 2020

Register Dask cuDF serializers dask/distributed#3478

Merged

jakirkham mentioned this pull request Feb 22, 2020

Using "dask" serialization protocol in spilling rapidsai/dask-cuda#242

Closed

jakirkham mentioned this pull request Jul 1, 2020

Evaluate further serialization performance improvements rapidsai/dask-cuda#106

Closed

vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs Dask Reviewer labels Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dask serializers for cuDF objects #4153

Add Dask serializers for cuDF objects #4153

jakirkham commented Feb 14, 2020

codecov bot commented Feb 14, 2020 •

edited

Loading

kkraus14 commented Feb 14, 2020

jakirkham commented Feb 14, 2020

quasiben commented Feb 14, 2020

jakirkham commented Feb 14, 2020 •

edited

Loading

quasiben commented Feb 14, 2020

jakirkham commented Feb 14, 2020

quasiben commented Feb 14, 2020

jakirkham commented Feb 14, 2020

jakirkham commented Feb 14, 2020

Add Dask serializers for cuDF objects #4153

Add Dask serializers for cuDF objects #4153

Conversation

jakirkham commented Feb 14, 2020

codecov bot commented Feb 14, 2020 • edited Loading

Codecov Report

kkraus14 commented Feb 14, 2020

jakirkham commented Feb 14, 2020

quasiben commented Feb 14, 2020

jakirkham commented Feb 14, 2020 • edited Loading

quasiben commented Feb 14, 2020

jakirkham commented Feb 14, 2020

quasiben commented Feb 14, 2020

jakirkham commented Feb 14, 2020

jakirkham commented Feb 14, 2020

codecov bot commented Feb 14, 2020 •

edited

Loading

jakirkham commented Feb 14, 2020 •

edited

Loading