[FEA] improve performance of chunked parquet write by not using pinned memory #12186

revans2 · 2022-11-17T16:45:23Z

Is your feature request related to a problem? Please describe.
I am not sure if this is a bug or a feature request but I am leaning towards a feature request. In Spark we typically have multiple streams processing data at the same time. Some of these may be copying data to or from the GPU. We have noticed that when doing a chunked parquet write that we can get some really large sections, when a cudaHostFree is being called, which will synchronize with the GPU.

These tend to correspond with being blocked behind a large data transfer to the GPU. Part of this is interesting because the cudf::io::data_sink we use returns true for supports_device_write. So there should be very little need for pinned memory at all. I really would like to understand if we really need pinned memory? In all cases? Even in the case where we accept device buffers for the write?

The text was updated successfully, but these errors were encountered:

…vice_vector` (#12251) Enables experimentation related to issue #12186 Frequent allocation of pinned memory becomes costly in multi-GPU environment, where all devices synchronize and allocation creates huge bubbles in GPU utilization. This PR adds environment variable `LIBCUDF_PREFER_PAGEABLE_MEMORY` that controls whether `hostdevice_vector` uses pinned memory for the host side. `hostdevice_vector` makes up most of pinned memory allocations in cuIO, so this environment variable should greatly impact the execution when using multiple GPUs. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Mike Wilson (https://github.com/hyperbolic2346) - Robert Maynard (https://github.com/robertmaynard) URL: #12251

vuule · 2023-01-09T07:37:33Z

@revans2 have you measured the impact of different options added in #12251 ?

GregoryKimball · 2024-02-15T22:50:51Z

Closed by #12251, also follow-on work can be found in #14314

revans2 added feature request New feature or request Needs Triage Need team to review and classify Performance Performance related issue Spark Functionality that helps Spark RAPIDS labels Nov 17, 2022

GregoryKimball added 0 - Backlog In queue waiting for assignment cuIO cuIO issue and removed Needs Triage Need team to review and classify labels Nov 19, 2022

GregoryKimball added this to the Parquet continuous improvement milestone Nov 19, 2022

vuule mentioned this issue Nov 29, 2022

Add environment variable to control host memory allocation in hostdevice_vector #12251

Merged

3 tasks

GregoryKimball added the libcudf Affects libcudf (C++/CUDA) code. label Apr 2, 2023

GregoryKimball closed this as completed Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] improve performance of chunked parquet write by not using pinned memory #12186

[FEA] improve performance of chunked parquet write by not using pinned memory #12186

revans2 commented Nov 17, 2022 •

edited

Loading

vuule commented Jan 9, 2023

GregoryKimball commented Feb 15, 2024

[FEA] improve performance of chunked parquet write by not using pinned memory #12186

[FEA] improve performance of chunked parquet write by not using pinned memory #12186

Comments

revans2 commented Nov 17, 2022 • edited Loading

vuule commented Jan 9, 2023

GregoryKimball commented Feb 15, 2024

revans2 commented Nov 17, 2022 •

edited

Loading