[BUG] cudaMalloc and cudaFree are being called during aggregations #10080
Labels
bug
Something isn't working
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Spark
Functionality that helps Spark RAPIDS
Describe the bug
While examining a recent trace I noticed that within the libcudf
aggregate
range there are calls tocudaMalloc
andcudaFree
, the latter which causes a synchronization on the default stream. I attached gdb and put a breakpoint oncudaMalloc
and found it was being triggered bycudf::detail::is_relationally_comparable<cudf::table_device_view>
because it callsthrust::all_of
without passing an execution policy. Without using the RMM policy, it will use the default CUDA allocator. Ideally it should be usingrmm::exec_policy(stream)
but the stream is not available to this method and would need to be passed.Steps/Code to reproduce bug
Attach a debugger to a query using the RMM arena allocator and executes an aggregation. Place a breakpoint on
cudaMalloc
and execute the query and observe the breakpoint is hit in a callstack that derives fromcudf::detail::is_relationally_comparable
.Expected behavior
libcudf should not trigger calls to
cudaMalloc
orcudaFree
.The text was updated successfully, but these errors were encountered: