[FEA] improve performance of chunked parquet write by not using pinned memory #12186
Labels
0 - Backlog
In queue waiting for assignment
cuIO
cuIO issue
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Spark
Functionality that helps Spark RAPIDS
Milestone
Is your feature request related to a problem? Please describe.
I am not sure if this is a bug or a feature request but I am leaning towards a feature request. In Spark we typically have multiple streams processing data at the same time. Some of these may be copying data to or from the GPU. We have noticed that when doing a chunked parquet write that we can get some really large sections, when a
cudaHostFree
is being called, which will synchronize with the GPU.These tend to correspond with being blocked behind a large data transfer to the GPU. Part of this is interesting because the
cudf::io::data_sink
we use returnstrue
forsupports_device_write
. So there should be very little need for pinned memory at all. I really would like to understand if we really need pinned memory? In all cases? Even in the case where we accept device buffers for the write?The text was updated successfully, but these errors were encountered: