Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use KvikIO in Dask-CUDA #925

Merged
merged 1 commit into from
Jun 27, 2023
Merged

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Jun 2, 2022

Fixes #844

This changes the spilling implementation in Dask-CUDA to use KvikIO.

@github-actions github-actions bot added the python python code needed label Jun 2, 2022
Copy link
Member Author

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few observations below based on trying to use KvikIO here. Also raised upstream issues/PRs where relevant

dask_cuda/disk_io.py Outdated Show resolved Hide resolved
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
@jakirkham jakirkham mentioned this pull request Jun 2, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jun 2, 2022

Codecov Report

Patch coverage has no change and project coverage change: -63.91 ⚠️

Comparison is base (83c6476) 63.90% compared to head (2ef69b3) 0.00%.

❗ Current head 2ef69b3 differs from pull request most recent head 000b896. Consider uploading reports for the commit 000b896 to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-23.08    #925       +/-   ##
================================================
- Coverage         63.90%   0.00%   -63.91%     
================================================
  Files                25      16        -9     
  Lines              3211    2286      -925     
================================================
- Hits               2052       0     -2052     
- Misses             1159    2286     +1127     
Impacted Files Coverage Δ
dask_cuda/disk_io.py 0.00% <0.00%> (-56.20%) ⬇️

... and 25 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pentschev pentschev added 2 - In Progress Currently a work in progress feature request New feature or request improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed feature request New feature or request labels Jun 2, 2022
@github-actions
Copy link

github-actions bot commented Jul 2, 2022

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@jakirkham jakirkham changed the base branch from branch-22.08 to branch-22.10 July 29, 2022 21:54
@jakirkham
Copy link
Member Author

Bumping to 22.10

@github-actions
Copy link

github-actions bot commented Sep 4, 2022

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

github-actions bot commented Oct 4, 2022

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@jakirkham jakirkham changed the base branch from branch-22.10 to branch-23.08 June 26, 2023 18:48
@jakirkham jakirkham force-pushed the use_kvikio branch 2 times, most recently from 9c7b4c5 to ec32d43 Compare June 26, 2023 19:00
@github-actions github-actions bot added the ci label Jun 26, 2023
@jakirkham jakirkham force-pushed the use_kvikio branch 2 times, most recently from 78270f5 to b36dfbe Compare June 26, 2023 21:37
@jakirkham jakirkham changed the title [WIP] Use KvikIO in Dask-CUDA Use KvikIO in Dask-CUDA Jun 26, 2023
@jakirkham jakirkham marked this pull request as ready for review June 26, 2023 22:12
@jakirkham jakirkham requested review from a team as code owners June 26, 2023 22:12
@jakirkham jakirkham added 4 - Needs Reviewer Waiting for reviewer to review or respond conda conda issue and removed 2 - In Progress Currently a work in progress labels Jun 27, 2023
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but I thknk we can do the IO concurrently by calling .get() on the IOFuture after the for-loop?

for frame, length in zip(frames, frame_lengths):
f.pwrite(buf=frame, count=length, file_offset=0, buf_offset=0)

f.pwrite(buf=frame, count=length, file_offset=0, buf_offset=0).get()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the async nature of pwrite() by delaying the .get() to after the for-loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah originally tried this and think it may work. However was a little unsure whether this could introduce race conditions as there is not a good way to chain these currently. So decided to punt on that. Maybe this can be handled in a follow up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to chain them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are writing to the same file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is support, as long as the IO doesn't overlap.
In fact, by default KvikIO reads and writes to the same file concurrently using a thread pool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thanks for the clarification! 🙏

Maybe we can follow up on this in a subsequent PR?

Switching to KvikIO in Dask-CUDA will simplify the CUDA 12 migration effort (as Dask-CUDA won't need to wait on cuCIM)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As follow up submitted PR ( #1205 ). It's possible more work is still needed there

Comment on lines +214 to +216
f.pread(
buf=buf, count=length, file_offset=file_offset, buf_offset=0
).get()
Copy link
Member

@madsbk madsbk Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And delaying the .get() to after the for-loop here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded above ( #925 (comment) ). For simplicity would suggest discussing in that thread and then (once concluded) updating both accordingly

Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakirkham !

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakirkham and @madsbk for the work here!

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me — I reviewed because of the relevance for CUDA 12. Do we have plans to distribute kvikio wheels with GDS support in the future?

@pentschev
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit 5a3c57f into rapidsai:branch-23.08 Jun 27, 2023
@jakirkham jakirkham deleted the use_kvikio branch June 27, 2023 15:47
@jakirkham
Copy link
Member Author

jakirkham commented Jun 27, 2023

Thanks all! 🙏

Do we have plans to distribute kvikio wheels with GDS support in the future?

Let's raise an issue on KvikIO to discuss 🙂

xref: rapidsai/kvikio#250

rapids-bot bot pushed a commit that referenced this pull request Jun 29, 2023
Follow up to this discussion ( #925 (comment) )

* Preallocates buffers before reading
* Uses NumPy `uint8` arrays for all host memory (benefits from hugepages on transfers)
* Handles IO asynchronously with KvikIO and waits at the end
* Uses vectorized IO for host reads & writes

Authors:
  - https://github.com/jakirkham
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1205
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Needs Reviewer Waiting for reviewer to review or respond ci conda conda issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python python code needed
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] Use KvikIO
6 participants