[BUG] 50% performance regression in concatenate
for string columns observed in 21.08
#8960
Labels
bug
Something isn't working
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Spark
Functionality that helps Spark RAPIDS
In the Spark plugin we saw an issue where calls to cudf
concatenate
(NVIDIA/spark-rapids#3135) caused Q4 derived from TPCDS to be 50% slower (~40 seconds from 20 seconds observed with cudf 21.06).The issue is described in the ticket referred above, but I repeat here a couple of traces for context:
21.08 before the fix:
Dave's development branch with the fix:
The main issue we saw was tiny (~4 Byte) DtoH copies to pageable memory before the
concatenate
kernel really runs.The code change is fairly simple: https://github.com/nvdbaranec/cudf/tree/concat_performance, we'd like to propose it be included in cuDF 21.08 if at all possible.
The text was updated successfully, but these errors were encountered: