-
Notifications
You must be signed in to change notification settings - Fork 915
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement a chunked_pack API (#13260)
This PR introduces a "chunked pack" API built on top of `contiguous_split`. This API is used when we want to copy a cuDF `table_view` over the wire or to the host in a contiguous layout (aka `contiguous_split`), but without user provided memory. As a result this API does not allocate any buffers for GPU data and instead it uses a provided user buffer to perform the `contiguous_split` in chunks. Luckily, `contiguous_split` already had a subdivision of work that we are now calling "batches". Each batch is up to 1MB of data from the source table. As such, one can think of this function as copying as many batches as will fit in a user buffer (or "chunking the batches"). The API follows other chunked interfaces in cuDF with a `has_next` and a `next`, with the difference that in this case `next` takes a `device_span`, and the user can provide any `device_span` as long as the size is the same as the size specified during construction. When thrust and scratch space is required on the GPU, this PR makes use of the memory resource passed, even to the point of using the second argument of `exec_policy` which is the memory resource. I found this while testing that `exec_policy` defaults to the per-device resource, and in this case I really wanted to pass a pooled memory resource, outside of our normal async memory resource to set aside this memory ahead of time. Most of the changes are about moving things around. You'll see 3 structs with the name "packed" (e.g. `packed_split_indices_and_src_buf_info`). These are here to group together state that `contiguous_split` needs to work, but now because of `chunked_pack` we need to keep around as well for the subsequent calls to `next`. These structs are also packed in memory, which is an optimization contiguous_split had already done to reduce the number of d2h/h2d copies. This PR did need to add a state object (`contiguous_split_state`) that now `contiguous_split` leverages. It also makes use of the `metadata_builder` which we added in a prior PR. This PR does not include the JNI changes needed for this to work on the java side, I'll post that separately. I figured this was too big already (and if people have suggestions on "chunking" this PR up, I am happy to do that). @nvdbaranec spent a great deal of time documenting `contiguous_split` for me and he suggested a path to get this done that I just followed (thank you!!) Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - https://github.com/nvdbaranec - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) URL: #13260
- Loading branch information
Showing
4 changed files
with
2,109 additions
and
604 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.