Use KvikIO in Dask-CUDA #925

jakirkham · 2022-06-02T03:15:48Z

Fixes #844

This changes the spilling implementation in Dask-CUDA to use KvikIO.

jakirkham

Made a few observations below based on trying to use KvikIO here. Also raised upstream issues/PRs where relevant

dask_cuda/disk_io.py

codecov-commenter · 2022-06-02T03:43:40Z

Codecov Report

Patch coverage has no change and project coverage change: -63.91 ⚠️

Comparison is base (83c6476) 63.90% compared to head (2ef69b3) 0.00%.

❗ Current head 2ef69b3 differs from pull request most recent head 000b896. Consider uploading reports for the commit 000b896 to get more accurate results

Additional details and impacted files

@@               Coverage Diff                @@
##           branch-23.08    #925       +/-   ##
================================================
- Coverage         63.90%   0.00%   -63.91%     
================================================
  Files                25      16        -9     
  Lines              3211    2286      -925     
================================================
- Hits               2052       0     -2052     
- Misses             1159    2286     +1127

Impacted Files	Coverage Δ
dask_cuda/disk_io.py	`0.00% <0.00%> (-56.20%)`	⬇️

... and 25 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

github-actions · 2022-07-02T16:03:10Z

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

jakirkham · 2022-07-29T21:54:43Z

Bumping to 22.10

github-actions · 2022-09-04T16:03:15Z

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-10-04T17:14:33Z

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

madsbk

Looks good to me but I thknk we can do the IO concurrently by calling .get() on the IOFuture after the for-loop?

madsbk · 2023-06-27T05:57:44Z

dask_cuda/disk_io.py

            for frame, length in zip(frames, frame_lengths):
-                f.pwrite(buf=frame, count=length, file_offset=0, buf_offset=0)
-
+                f.pwrite(buf=frame, count=length, file_offset=0, buf_offset=0).get()


I think you can use the async nature of pwrite() by delaying the .get() to after the for-loop?

Yeah originally tried this and think it may work. However was a little unsure whether this could introduce race conditions as there is not a good way to chain these currently. So decided to punt on that. Maybe this can be handled in a follow up?

Do we need to chain them?

They are writing to the same file

That is support, as long as the IO doesn't overlap.
In fact, by default KvikIO reads and writes to the same file concurrently using a thread pool.

Interesting thanks for the clarification! 🙏

Maybe we can follow up on this in a subsequent PR?

Switching to KvikIO in Dask-CUDA will simplify the CUDA 12 migration effort (as Dask-CUDA won't need to wait on cuCIM)

As follow up submitted PR ( #1205 ). It's possible more work is still needed there

madsbk · 2023-06-27T05:59:10Z

dask_cuda/disk_io.py

+                f.pread(
+                    buf=buf, count=length, file_offset=file_offset, buf_offset=0
+                ).get()


And delaying the .get() to after the for-loop here as well?

Responded above ( #925 (comment) ). For simplicity would suggest discussing in that thread and then (once concluded) updating both accordingly

madsbk

Thanks @jakirkham !

pentschev

Thanks @jakirkham and @madsbk for the work here!

bdice

Looks fine to me — I reviewed because of the relevance for CUDA 12. Do we have plans to distribute kvikio wheels with GDS support in the future?

pentschev · 2023-06-27T13:36:22Z

/merge

jakirkham · 2023-06-27T15:48:39Z

Thanks all! 🙏

Do we have plans to distribute kvikio wheels with GDS support in the future?

Let's raise an issue on KvikIO to discuss 🙂

xref: rapidsai/kvikio#250

Follow up to this discussion ( #925 (comment) ) * Preallocates buffers before reading * Uses NumPy `uint8` arrays for all host memory (benefits from hugepages on transfers) * Handles IO asynchronously with KvikIO and waits at the end * Uses vectorized IO for host reads & writes Authors: - https://github.com/jakirkham - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Peter Andreas Entschev (https://github.com/pentschev) URL: #1205

github-actions bot added the python python code needed label Jun 2, 2022

jakirkham commented Jun 2, 2022

View reviewed changes

jakirkham mentioned this pull request Jun 2, 2022

[FEA] Use KvikIO #844

Closed

pentschev added 2 - In Progress Currently a work in progress feature request New feature or request improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed feature request New feature or request labels Jun 2, 2022

github-actions bot added inactive-30d and removed inactive-30d labels Jul 2, 2022

jakirkham changed the base branch from branch-22.08 to branch-22.10 July 29, 2022 21:54

github-actions bot added inactive-30d and removed inactive-30d labels Sep 4, 2022

github-actions bot added inactive-30d and removed inactive-30d labels Oct 4, 2022

jakirkham changed the base branch from branch-22.10 to branch-23.08 June 26, 2023 18:48

jakirkham force-pushed the use_kvikio branch 2 times, most recently from 9c7b4c5 to ec32d43 Compare June 26, 2023 19:00

github-actions bot added the ci label Jun 26, 2023

jakirkham force-pushed the use_kvikio branch 2 times, most recently from 78270f5 to b36dfbe Compare June 26, 2023 21:37

jakirkham changed the title ~~[WIP] Use KvikIO in Dask-CUDA~~ Use KvikIO in Dask-CUDA Jun 26, 2023

jakirkham marked this pull request as ready for review June 26, 2023 22:12

jakirkham requested review from a team as code owners June 26, 2023 22:12

Use KvikIO in Dask-CUDA

000b896

jakirkham force-pushed the use_kvikio branch from b36dfbe to 000b896 Compare June 27, 2023 04:57

jakirkham added 4 - Needs Reviewer Waiting for reviewer to review or respond conda conda issue and removed 2 - In Progress Currently a work in progress labels Jun 27, 2023

madsbk reviewed Jun 27, 2023

View reviewed changes

madsbk approved these changes Jun 27, 2023

View reviewed changes

pentschev approved these changes Jun 27, 2023

View reviewed changes

bdice approved these changes Jun 27, 2023

View reviewed changes

raydouglass approved these changes Jun 27, 2023

View reviewed changes

rapids-bot bot merged commit 5a3c57f into rapidsai:branch-23.08 Jun 27, 2023

jakirkham deleted the use_kvikio branch June 27, 2023 15:47

jakirkham mentioned this pull request Jun 28, 2023

Aggregate reads & writes in disk_io #1205

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use KvikIO in Dask-CUDA #925

Use KvikIO in Dask-CUDA #925

jakirkham commented Jun 2, 2022 •

edited

Loading

jakirkham left a comment

codecov-commenter commented Jun 2, 2022 •

edited

Loading

github-actions bot commented Jul 2, 2022

jakirkham commented Jul 29, 2022

github-actions bot commented Sep 4, 2022

github-actions bot commented Oct 4, 2022

madsbk left a comment

madsbk Jun 27, 2023

jakirkham Jun 27, 2023

madsbk Jun 27, 2023

jakirkham Jun 27, 2023

madsbk Jun 27, 2023

jakirkham Jun 27, 2023

madsbk Jun 27, 2023

jakirkham Jun 28, 2023

madsbk Jun 27, 2023 •

edited

Loading

jakirkham Jun 27, 2023

madsbk left a comment

pentschev left a comment

bdice left a comment

pentschev commented Jun 27, 2023

jakirkham commented Jun 27, 2023 •

edited

Loading

Use KvikIO in Dask-CUDA #925

Use KvikIO in Dask-CUDA #925

Conversation

jakirkham commented Jun 2, 2022 • edited Loading

jakirkham left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jun 2, 2022 • edited Loading

Codecov Report

github-actions bot commented Jul 2, 2022

jakirkham commented Jul 29, 2022

github-actions bot commented Sep 4, 2022

github-actions bot commented Oct 4, 2022

madsbk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madsbk Jun 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madsbk left a comment

Choose a reason for hiding this comment

pentschev left a comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

pentschev commented Jun 27, 2023

jakirkham commented Jun 27, 2023 • edited Loading

jakirkham commented Jun 2, 2022 •

edited

Loading

codecov-commenter commented Jun 2, 2022 •

edited

Loading

madsbk Jun 27, 2023 •

edited

Loading

jakirkham commented Jun 27, 2023 •

edited

Loading