Add in retry for ORC writes [databricks] #7972

revans2 · 2023-03-29T18:12:28Z

This fixes #7341
This fixes #7960

I also included a fix for metrics because for some reason when writing data the metrics were not deserialized until all of the query was done. For now I just made the task level metrics go on most GPU operators.

I did some performance testing with smaller and smaller memory. The issues that I was seeing were mostly with not being able to split inputs on a parquet read (when I got down to 4 GiB of GPU memory and 4x parallelism) I also saw some issues with running out of memory when trying to read back in spilled data.

It is a nice logarithmic looking performance drop off, which is nice to see.

Signed-off-by: Robert (Bobby) Evans <[email protected]>

revans2 · 2023-03-29T18:12:45Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ColumnarOutputWriter.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuOrcFileFormat.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTransitionOverrides.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetFileFormat.scala

revans2 · 2023-03-30T14:44:42Z

build

revans2 · 2023-03-30T15:56:54Z

build

revans2 · 2023-03-30T18:17:17Z

Looks like the OOM injection PR messed with some things for this PR, so I will spend some time to debug this...

revans2 · 2023-03-30T20:18:38Z

build

revans2 · 2023-03-30T20:20:02Z

@jlowe sorry about more test failures but it should be fixed now please take another look

@abellina please take a look at my latest patch which fixes some issues with OOM injection

jbrennan333

lgtm

Add in retry for ORC writes

98e1bb2

Signed-off-by: Robert (Bobby) Evans <[email protected]>

jlowe reviewed Mar 29, 2023

View reviewed changes

Review comments and some fixes

e674f5c

revans2 changed the title ~~Add in retry for ORC writes~~ Add in retry for ORC writes [databricks] Mar 30, 2023

revans2 added 2 commits March 30, 2023 09:47

Merge branch 'branch-23.04' into orc_retry

0e820b8

Missed one test

646355f

jlowe previously approved these changes Mar 30, 2023

View reviewed changes

Merge branch 'branch-23.04' into orc_retry

05bebde

Fix injection for ORC write tests

7d957aa

revans2 dismissed jlowe’s stale review via 7d957aa March 30, 2023 20:17

abellina approved these changes Mar 30, 2023

View reviewed changes

jlowe approved these changes Mar 30, 2023

View reviewed changes

jbrennan333 approved these changes Mar 30, 2023

View reviewed changes

revans2 merged commit 663c39a into NVIDIA:branch-23.04 Mar 31, 2023

revans2 deleted the orc_retry branch March 31, 2023 13:50

sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Apr 1, 2023

mattahrens assigned revans2 Apr 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add in retry for ORC writes [databricks] #7972

Add in retry for ORC writes [databricks] #7972

revans2 commented Mar 29, 2023

revans2 commented Mar 29, 2023

revans2 commented Mar 30, 2023

revans2 commented Mar 30, 2023

revans2 commented Mar 30, 2023

revans2 commented Mar 30, 2023

revans2 commented Mar 30, 2023

jbrennan333 left a comment

Add in retry for ORC writes [databricks] #7972

Add in retry for ORC writes [databricks] #7972

Conversation

revans2 commented Mar 29, 2023

revans2 commented Mar 29, 2023

revans2 commented Mar 30, 2023

revans2 commented Mar 30, 2023

revans2 commented Mar 30, 2023

revans2 commented Mar 30, 2023

revans2 commented Mar 30, 2023

jbrennan333 left a comment

Choose a reason for hiding this comment