Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] improve performance of chunked parquet write by not using pinned memory #12186

Closed
revans2 opened this issue Nov 17, 2022 · 2 comments
Closed
Labels
0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Nov 17, 2022

Is your feature request related to a problem? Please describe.
I am not sure if this is a bug or a feature request but I am leaning towards a feature request. In Spark we typically have multiple streams processing data at the same time. Some of these may be copying data to or from the GPU. We have noticed that when doing a chunked parquet write that we can get some really large sections, when a cudaHostFree is being called, which will synchronize with the GPU.

cudaHostFreeInWrite

These tend to correspond with being blocked behind a large data transfer to the GPU. Part of this is interesting because the cudf::io::data_sink we use returns true for supports_device_write. So there should be very little need for pinned memory at all. I really would like to understand if we really need pinned memory? In all cases? Even in the case where we accept device buffers for the write?

@revans2 revans2 added feature request New feature or request Needs Triage Need team to review and classify Performance Performance related issue Spark Functionality that helps Spark RAPIDS labels Nov 17, 2022
@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment cuIO cuIO issue and removed Needs Triage Need team to review and classify labels Nov 19, 2022
rapids-bot bot pushed a commit that referenced this issue Dec 2, 2022
…vice_vector` (#12251)

Enables experimentation related to issue #12186

Frequent allocation of pinned memory becomes costly in multi-GPU environment, where all devices synchronize and allocation creates huge bubbles in GPU utilization.

This PR adds environment variable `LIBCUDF_PREFER_PAGEABLE_MEMORY` that controls whether `hostdevice_vector` uses pinned memory for the host side. `hostdevice_vector` makes up most of pinned memory allocations in cuIO, so this environment variable should greatly impact the execution when using multiple GPUs.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #12251
@vuule
Copy link
Contributor

vuule commented Jan 9, 2023

@revans2 have you measured the impact of different options added in #12251 ?

@GregoryKimball GregoryKimball added the libcudf Affects libcudf (C++/CUDA) code. label Apr 2, 2023
@GregoryKimball
Copy link
Contributor

Closed by #12251, also follow-on work can be found in #14314

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

3 participants