Replace direct `cudaMemcpyAsync` calls with utility functions (limited to `cudf::io`) #17132

vuule · 2024-10-18T22:12:50Z

Description

Replaced the calls to cudaMemcpyAsync with the new cuda_memcpy/cuda_memcpy_async utility, which optionally avoids using the copy engine. Changes are limited to cuIO to make the PR easier to review (repetitive enough as-is!).

Also took the opportunity to use cudf::detail::host_vector and its factories to enable wider pinned memory use.

Skipped a few instances of cudaMemcpyAsync; few are under io::comp, which we don't want to invest in further (if possible). The other cudaMemcpyAsync instances are D2D copies, which cuda_memcpy/cuda_memcpy_async don't support. Perhaps they should, just to make the use ubiquitous.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…fea-remove-cudamemcpy-io

vuule · 2024-10-18T22:15:22Z

cpp/src/io/parquet/reader_impl_preprocess.cu

@@ -218,7 +218,7 @@ void generate_depth_remappings(
 */
 [[nodiscard]] std::future<void> read_column_chunks_async(
  std::vector<std::unique_ptr<datasource>> const& sources,
-  std::vector<std::unique_ptr<datasource::buffer>>& page_data,
+  cudf::host_span<rmm::device_buffer> page_data,


simplified outdated complexity

…fea-remove-cudamemcpy-io

vuule · 2024-10-22T22:00:59Z

cpp/src/io/orc/reader_impl_chunking.cu

-        std::pair(source_ptr->device_read_async(
-                    read_info.offset, read_info.length, dst_base + read_info.dst_pos, _stream),
-                  read_info.length));
+      device_read_tasks.emplace_back(


Unrelated change; noticed clang-tidy complaining that we used to make an unnecessary move here :)

ttnghia · 2024-10-23T05:00:08Z

Do you have (run) any benchmark to make sure there is no regression?

vuule · 2024-10-23T16:10:36Z

Do you have (run) any benchmark to make sure there is no regression?

I haven't because we currently don't do anything differently - we end up calling cudaMemcpyAsync on a pageable buffer.
I'll run all benchmarks once we actually move toward setting allocate_host_as_pinned_threshold and/or kernel_pinned_copy_threshold.

pmattione-nvidia · 2024-10-23T16:33:37Z

cpp/src/io/text/data_chunk_source_factories.cpp

@@ -87,8 +87,10 @@ class datasource_chunk_reader : public data_chunk_reader {
      _source->host_read(_offset, read_size, reinterpret_cast<uint8_t*>(h_ticket.buffer.data()));

      // copy the host-pinned data on to device
-      CUDF_CUDA_TRY(cudaMemcpyAsync(
-        chunk.data(), h_ticket.buffer.data(), read_size, cudaMemcpyDefault, stream.value()));
+      cudf::detail::cuda_memcpy_async<char>(


There are a number of places where the template argument (char) is given explicitly ... is the compiler really not able to deduce it from the inputs?

hm, maybe it wasn't required here
In general, compiler can't do template type deduction + implicit conversion. So passing a container that get implicitly converted to a span requires the template type for cuda_memcpy_async.

vuule · 2024-10-23T18:21:19Z

/merge

vuule added 10 commits October 9, 2024 14:25

parquet

d8659ee

missed one

ec23554

orc

171440c

json pt1

4553d9e

Merge branch 'branch-24.12' of https://github.com/rapidsai/cudf into …

540756c

…fea-remove-cudamemcpy-io

datasource

186b930

json rest

de28668

csv

afcb4dd

Merge branch 'branch-24.12' of https://github.com/rapidsai/cudf into …

da45d00

…fea-remove-cudamemcpy-io

rest of it

75dc549

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 18, 2024

vuule self-assigned this Oct 18, 2024

vuule added Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 18, 2024

vuule commented Oct 18, 2024

View reviewed changes

vuule changed the title ~~Replace cudaMemcpyAsync calls with cuda_memcpy_async~~ Replace direct cudaMemcpyAsync calls with utility functions Oct 18, 2024

vuule changed the title ~~Replace direct cudaMemcpyAsync calls with utility functions~~ Replace direct cudaMemcpyAsync calls with utility functions (limited to cudf::io) Oct 18, 2024

vuule added 3 commits October 21, 2024 16:26

Merge branch 'branch-24.12' of https://github.com/rapidsai/cudf into …

6b9881b

…fea-remove-cudamemcpy-io

json fix

39d16f7

Merge branch 'branch-24.12' into fea-remove-cudamemcpy-io

e45d242

vuule commented Oct 22, 2024

View reviewed changes

vuule marked this pull request as ready for review October 22, 2024 22:11

vuule requested a review from a team as a code owner October 22, 2024 22:11

vuule requested review from karthikeyann and pmattione-nvidia October 22, 2024 22:11

pmattione-nvidia reviewed Oct 23, 2024

View reviewed changes

pmattione-nvidia approved these changes Oct 23, 2024

View reviewed changes

ttnghia approved these changes Oct 23, 2024

View reviewed changes

rapids-bot bot merged commit deb9af4 into rapidsai:branch-24.12 Oct 23, 2024
122 checks passed

shrshi mentioned this pull request Oct 29, 2024

Fix bug in recovering invalid lines in JSONL inputs #17098

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace direct `cudaMemcpyAsync` calls with utility functions (limited to `cudf::io`) #17132

Replace direct `cudaMemcpyAsync` calls with utility functions (limited to `cudf::io`) #17132

vuule commented Oct 18, 2024 •

edited

Loading

vuule Oct 18, 2024

vuule Oct 22, 2024

ttnghia commented Oct 23, 2024

vuule commented Oct 23, 2024

pmattione-nvidia Oct 23, 2024 •

edited

Loading

vuule Oct 23, 2024

vuule commented Oct 23, 2024

Replace direct cudaMemcpyAsync calls with utility functions (limited to cudf::io) #17132

Replace direct cudaMemcpyAsync calls with utility functions (limited to cudf::io) #17132

Conversation

vuule commented Oct 18, 2024 • edited Loading

Description

Checklist

vuule Oct 18, 2024

Choose a reason for hiding this comment

vuule Oct 22, 2024

Choose a reason for hiding this comment

ttnghia commented Oct 23, 2024

vuule commented Oct 23, 2024

pmattione-nvidia Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

vuule Oct 23, 2024

Choose a reason for hiding this comment

vuule commented Oct 23, 2024

Replace direct `cudaMemcpyAsync` calls with utility functions (limited to `cudf::io`) #17132

Replace direct `cudaMemcpyAsync` calls with utility functions (limited to `cudf::io`) #17132

vuule commented Oct 18, 2024 •

edited

Loading

pmattione-nvidia Oct 23, 2024 •

edited

Loading