-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use KvikIO in Dask-CUDA #925
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a few observations below based on trying to use KvikIO here. Also raised upstream issues/PRs where relevant
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## branch-23.08 #925 +/- ##
================================================
- Coverage 63.90% 0.00% -63.91%
================================================
Files 25 16 -9
Lines 3211 2286 -925
================================================
- Hits 2052 0 -2052
- Misses 1159 2286 +1127
☔ View full report in Codecov by Sentry. |
This PR has been labeled |
Bumping to 22.10 |
This PR has been labeled |
This PR has been labeled |
9c7b4c5
to
ec32d43
Compare
78270f5
to
b36dfbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me but I thknk we can do the IO concurrently by calling .get()
on the IOFuture
after the for-loop?
for frame, length in zip(frames, frame_lengths): | ||
f.pwrite(buf=frame, count=length, file_offset=0, buf_offset=0) | ||
|
||
f.pwrite(buf=frame, count=length, file_offset=0, buf_offset=0).get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use the async nature of pwrite()
by delaying the .get()
to after the for-loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah originally tried this and think it may work. However was a little unsure whether this could introduce race conditions as there is not a good way to chain these currently. So decided to punt on that. Maybe this can be handled in a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to chain them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are writing to the same file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is support, as long as the IO doesn't overlap.
In fact, by default KvikIO reads and writes to the same file concurrently using a thread pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting thanks for the clarification! 🙏
Maybe we can follow up on this in a subsequent PR?
Switching to KvikIO in Dask-CUDA will simplify the CUDA 12 migration effort (as Dask-CUDA won't need to wait on cuCIM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As follow up submitted PR ( #1205 ). It's possible more work is still needed there
f.pread( | ||
buf=buf, count=length, file_offset=file_offset, buf_offset=0 | ||
).get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And delaying the .get()
to after the for-loop here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responded above ( #925 (comment) ). For simplicity would suggest discussing in that thread and then (once concluded) updating both accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jakirkham !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jakirkham and @madsbk for the work here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me — I reviewed because of the relevance for CUDA 12. Do we have plans to distribute kvikio wheels with GDS support in the future?
/merge |
Thanks all! 🙏
Let's raise an issue on KvikIO to discuss 🙂 xref: rapidsai/kvikio#250 |
Follow up to this discussion ( #925 (comment) ) * Preallocates buffers before reading * Uses NumPy `uint8` arrays for all host memory (benefits from hugepages on transfers) * Handles IO asynchronously with KvikIO and waits at the end * Uses vectorized IO for host reads & writes Authors: - https://github.com/jakirkham - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Peter Andreas Entschev (https://github.com/pentschev) URL: #1205
Fixes #844
This changes the spilling implementation in Dask-CUDA to use KvikIO.