Create non-shim specific version of ParquetCachedBatchSerializer #3473

tgravescs · 2021-09-14T02:06:50Z

This adds upon #3390 (credit to @razajafri as well) to create a common user facing generic class for the ParquetCachedBatchSerializer that then loads the shim specific version of it as necessary.

fixes #3314

User just has to set:
--conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer

And that should handle loading the shim. Right now there is only one version underneath the 311+-all directory.

This includes the spark 3.1.1 version of com.nvidia.spark.ParquetCachedBatchSerializer into the base of the jar to be loaded after that the shim version load will be the one under the spark specific directory. spark312/, spark311, etc..

I did not add the function to ShimLoader (like newDriverPlugin) because the code is in the spark311+-all directory since the CachedBatchSerializer isn't available until spark 3.1.0

Fixed the databricks 8.2 build and ran tests.

Ran tests on spark 3.1.1, 3.1.2, 3.2.0 and databricks 8.2

I was not able to validate the test script updates.

Signed-off-by: Raza Jafri <[email protected]>

This reverts commit cafaa08. Signed-off-by: Raza Jafri <[email protected]>

Signed-off-by: Raza Jafri <[email protected]>

Signed-off-by: Thomas Graves <[email protected]>

batch serializer Signed-off-by: Thomas Graves <[email protected]>

… 3390-tgraves

tgravescs · 2021-09-14T02:07:08Z

build

tgravescs · 2021-09-14T02:08:11Z

sql-plugin/src/main/311+-all/scala/com/nvidia/spark/ParquetCachedBatchSerializer.scala

+   * @param conf               the configuration for the job.
+   * @return an RDD of the input cached batches transformed into the ColumnarBatch format.
+   */
+  override def gpuConvertCachedBatchToColumnarBatch(


this is the only function not in the Spark CachedBatchSerializer

tgravescs · 2021-09-14T12:40:29Z

build

tgravescs · 2021-09-14T19:09:08Z

build

sql-plugin/src/main/311+-all/scala/com/nvidia/spark/ParquetCachedBatchSerializer.scala

- Upgrade to Scala 2.12.15 - Add `-Xfatal-warnings` to scalac params - Add `nowarn` annotations to existing warnings Closes #3473

razajafri and others added 16 commits September 3, 2021 12:26

Removed redundant PCBS and GpuInMemoryTableScanExec classes

addbb4b

Signed-off-by: Raza Jafri <[email protected]>

turn the nightly tests on

cafaa08

Signed-off-by: Raza Jafri <[email protected]>

fixed pcbs and InMemoryTableScanExec package

f1a3e99

Signed-off-by: Raza Jafri <[email protected]>

Revert "turn the nightly tests on"

c4728ec

This reverts commit cafaa08. Signed-off-by: Raza Jafri <[email protected]>

adding no-default-profile to spark313

f26a0c6

Signed-off-by: Raza Jafri <[email protected]>

Merge remote-tracking branch 'origin/branch-21.10' into 3390

0bb8f9d

cleanup

26642a5

Add common class for the ParquetCachedBatchSerializer

b6b3899

Signed-off-by: Thomas Graves <[email protected]>

Fix the shim layer to pick up the proper versions of Parquet cached

7e7c060

batch serializer Signed-off-by: Thomas Graves <[email protected]>

docs

149f827

Update docs

2779bf9

Fix databricks build

701931a

re-enable cache tests

40dcdd6

Merge branch '3390-tgraves' of github.com:tgravescs/spark-rapids into…

15b5200

… 3390-tgraves

Fix the class being used for Spark 3.2.0

df33221

Update includes

52fc150

tgravescs added the feature request New feature or request label Sep 14, 2021

tgravescs added this to the Sep 13 - Sep 24 milestone Sep 14, 2021

tgravescs self-assigned this Sep 14, 2021

tgravescs requested review from GaryShen2008 and jlowe as code owners September 14, 2021 02:06

tgravescs requested review from NvTimLiu and revans2 as code owners September 14, 2021 02:06

tgravescs commented Sep 14, 2021

View reviewed changes

fix style

30e1638

Merge remote-tracking branch 'origin/branch-21.10' into 3390-tgraves

983c8c9

revans2 reviewed Sep 14, 2021

View reviewed changes

sql-plugin/src/main/311+-all/scala/com/nvidia/spark/ParquetCachedBatchSerializer.scala Show resolved Hide resolved

revans2 approved these changes Sep 14, 2021

View reviewed changes

tgravescs merged commit b3f9773 into NVIDIA:branch-21.10 Sep 14, 2021

tgravescs deleted the 3390-tgraves branch September 14, 2021 21:04

gerashegalov mentioned this pull request Nov 10, 2021

Treat scalac warnings as errors #4071

Merged

gerashegalov added a commit that referenced this pull request Nov 12, 2021

Treat scalac warnings as errors (#4071)

1724d38

- Upgrade to Scala 2.12.15 - Add `-Xfatal-warnings` to scalac params - Add `nowarn` annotations to existing warnings Closes #3473

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create non-shim specific version of ParquetCachedBatchSerializer #3473

Create non-shim specific version of ParquetCachedBatchSerializer #3473

tgravescs commented Sep 14, 2021

tgravescs commented Sep 14, 2021

tgravescs Sep 14, 2021

tgravescs commented Sep 14, 2021

tgravescs commented Sep 14, 2021

Create non-shim specific version of ParquetCachedBatchSerializer #3473

Create non-shim specific version of ParquetCachedBatchSerializer #3473

Conversation

tgravescs commented Sep 14, 2021

tgravescs commented Sep 14, 2021

tgravescs Sep 14, 2021

Choose a reason for hiding this comment

tgravescs commented Sep 14, 2021

tgravescs commented Sep 14, 2021