[FEA] cuDF Parquet reader should use pinned memory to copy data from sysmem to GPU #6376

chirayuG-nvidia · 2020-09-30T19:49:46Z

Is your feature request related to a problem? Please describe.
NvTabular is building asynchronous dataloader for accelerating tabular dataloading for DL framework like PyTorch and TensorFlow. The primary function of this async dataloader is prepare training data by reading from Parquet input files and preparing input tensors for training. Currently Parquet reader in cuDF uses pageable memory to copy parquet input to GPU for decompression. HostToDeviceMemcpy from pageable memory leads to a lock in CUDA context that prevents from submitting other CUDA API calls from the framework training thread. Nsight profiler image at the end illustrate the problem.

Describe the solution you'd like
Memmapped Parquet input files can be first staged on a pinned system memory before issuing H2D memcpy, this will have additional overhead of a CPU memcpy but it will be good to implement and understand the performance improvements (or regressions) from this approach.

Additional context

@jperez999 @benfred @EvenOldridge to help further define NVT priority for this.

github-actions · 2021-02-16T21:17:28Z

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

jperez999 · 2021-02-18T03:59:06Z

Get Outlook for Android<https://aka.ms/ghei36>

…

________________________________ From: github-actions[bot] <[email protected]> Sent: Tuesday, February 16, 2021 4:17:46 PM To: rapidsai/cudf <[email protected]> Cc: Julio Perez <[email protected]>; Mention <[email protected]> Subject: Re: [rapidsai/cudf] [FEA] cuDF Parquet reader should use pinned memory to copy data from sysmem to GPU (#6376) This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#6376 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AI3X547NVAACC3Z5SSUM5MTS7LOHVANCNFSM4R7QMW6Q>.

devavret · 2021-03-17T12:18:52Z

We are waiting on having a pool of pinned memory in RMM.

EvenOldridge · 2021-10-27T17:13:07Z

@JohnZed @rjzamora Is this something we can look into?

vuule · 2023-01-09T07:23:10Z

Once we switch to kvikIO for device reads, data from the input file will be transfered via pinned buffers in kvikIO's pool. I assume this will address the issue since that's the bulk of data copied onto the GPU.
Expecting to start using kvikIO in next release, will update this issue once we do.

GregoryKimball · 2024-02-15T22:53:12Z

I believe this is closed by #12574. Further improvements in data copying should be scoped within kvikIO.

chirayuG-nvidia added Needs Triage Need team to review and classify feature request New feature or request labels Sep 30, 2020

harrism added cuIO cuIO issue Performance Performance related issue libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Oct 1, 2020

github-actions bot added the rotten label Feb 16, 2021

github-actions bot removed the inactive-90d label Mar 17, 2021

GregoryKimball added this to the Parquet continuous improvement milestone Nov 26, 2022

GregoryKimball closed this as completed Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] cuDF Parquet reader should use pinned memory to copy data from sysmem to GPU #6376

[FEA] cuDF Parquet reader should use pinned memory to copy data from sysmem to GPU #6376

chirayuG-nvidia commented Sep 30, 2020

github-actions bot commented Feb 16, 2021

jperez999 commented Feb 18, 2021 via email

devavret commented Mar 17, 2021

EvenOldridge commented Oct 27, 2021

vuule commented Jan 9, 2023

GregoryKimball commented Feb 15, 2024

[FEA] cuDF Parquet reader should use pinned memory to copy data from sysmem to GPU #6376

[FEA] cuDF Parquet reader should use pinned memory to copy data from sysmem to GPU #6376

Comments

chirayuG-nvidia commented Sep 30, 2020

github-actions bot commented Feb 16, 2021

jperez999 commented Feb 18, 2021 via email

devavret commented Mar 17, 2021

EvenOldridge commented Oct 27, 2021

vuule commented Jan 9, 2023

GregoryKimball commented Feb 15, 2024