[FEA] explore faster data transitions #507

revans2 · 2020-08-04T14:42:37Z

When working with some of the cache/persist operations it has become very clear that moving data from CPU to GPU and back is a real performance problem. From past experience trying to optimize shuffle, part of the problem come down to the number of buffers that need to be moved. This is something that is going to become more and more of a problem with nested data types. The rest of the problem has a lot to do with the actual data access pattern. Going from row to column and column to row inherently forces one of the operations to stride through memory. This is really bad for the CPU cache.

We have tried to write a custom kernel to translate GPU columnar data into spark's unsafe row format in the past, and it did help some, but the memory format is really wasteful and resulted in bad performance because we could not allocate enough GPU memory to make it worth while.

I personally would like to see us work with the cudf team to develop a packed row based format that we could translate to/from on the GPU. Doing a row based to row based translation is not that expensive for the CPU.

Tasks:

sameerz · 2022-01-30T23:47:15Z

Removing this from the sprint milestones as this is an overarching feature with sub-tasks for each sprint.

Signed-off-by: Peixin Li <[email protected]>

Signed-off-by: spark-rapids automation <[email protected]> Signed-off-by: spark-rapids automation <[email protected]>

revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Aug 4, 2020

sameerz removed the ? - Needs Triage Need team to review and classify label Aug 4, 2020

sameerz mentioned this issue Dec 1, 2020

[FEA] Faster transition from columns to rows #14

Closed

sameerz assigned revans2 Dec 1, 2020

sameerz added the performance A performance related task/issue label Dec 1, 2020

sameerz added this to the Jan 4 - Jan 15 milestone Dec 18, 2020

sameerz added epic Issue that encompasses a significant feature or body of work and removed feature request New feature or request labels Dec 18, 2020

sameerz modified the milestones: Jan 4 - Jan 15, Jan 18 - Jan 29 Jan 15, 2021

sameerz assigned hyperbolic2346 Jan 15, 2021

sameerz modified the milestones: Jan 18 - Jan 29, Feb 1 - Feb 12 Jan 30, 2021

sameerz removed this from the Feb 1 - Feb 12 milestone Feb 12, 2021

sameerz added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Feb 18, 2021

Salonijain27 added this to the Nov 1 - Nov 12 milestone Oct 31, 2021

sameerz modified the milestones: Nov 1 - Nov 12, Nov 15 - Nov 26 Nov 12, 2021

hyperbolic2346 removed this from the Nov 15 - Nov 26 milestone Nov 12, 2021

sameerz added this to the Nov 15 - Nov 26 milestone Nov 14, 2021

sameerz modified the milestones: Nov 15 - Nov 26, Nov 30 - Dec 10 Nov 30, 2021

sameerz modified the milestones: Nov 30 - Dec 10, Dec 13 - Jan 7 Dec 10, 2021

sameerz modified the milestones: Dec 13 - Jan 7, Jan 10 - Jan 28 Jan 8, 2022

sameerz removed this from the Jan 10 - Jan 28 milestone Jan 30, 2022

revans2 removed their assignment Feb 1, 2022

mattahrens added the P0 Must have for release label Apr 27, 2022

pxLi added a commit to pxLi/spark-rapids that referenced this issue May 12, 2022

Modify existing premerge CI to runs-on temp-ci label (NVIDIA#507)

7171087

Signed-off-by: Peixin Li <[email protected]>

amahussein mentioned this issue Jul 22, 2022

[FEA] GPU columnar-row transition performance evaluation and tuning #6065

Open

mattahrens removed the P0 Must have for release label Aug 7, 2023

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023

Update submodule cudf to 5ee4b3b (NVIDIA#507)

3555a8e

Signed-off-by: spark-rapids automation <[email protected]> Signed-off-by: spark-rapids automation <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] explore faster data transitions #507

[FEA] explore faster data transitions #507

revans2 commented Aug 4, 2020 •

edited by amahussein

Loading

sameerz commented Jan 30, 2022

[FEA] explore faster data transitions #507

[FEA] explore faster data transitions #507

Comments

revans2 commented Aug 4, 2020 • edited by amahussein Loading

sameerz commented Jan 30, 2022

revans2 commented Aug 4, 2020 •

edited by amahussein

Loading