Force parallel world in Shim caller's classloader #3763

gerashegalov · 2021-10-07T01:15:14Z

This PR codes up the approach that was tested on Databricks as an alternative while working on #3756

making this option default because it's mostly equivalent to the old flat jar classloader
making it optional because we still debug how we miss the non-default classloader in [BUG] Executor-side ClassCastException when testing with Spark 3.2.1-SNAPSHOT in k8s environment #3704, and it's not the right behavior for addJar with userClassPathFirst. However, I think we should generally stop documenting --jars as the plugin deploy option

Signed-off-by: Gera Shegalov [email protected]

- uprev spark320 breaking DiskManager changes - use the Serializer instance to find mutable classloader - make the the update logic oblivious to the executor/driver side Signed-off-by: Gera Shegalov <[email protected]>

Signed-off-by: Gera Shegalov <[email protected]>

…oaderHack_k8s_for_Andy

Signed-off-by: Gera Shegalov <[email protected]>

- making this option default because it's equivalent to the old flat jar - making it optional because we still debug how we miss the non-default classloader in #NVIDIA#3704 and it's not the right behavior for addJar with userClassPathFirst. However, I think we shoudl generally stop documenting --jars as the plugin deploy option Fixes NVIDIA#3704 Signed-off-by: Gera Shegalov <[email protected]>

Signed-off-by: Gera Shegalov <[email protected]>

gerashegalov · 2021-10-07T01:17:38Z

build

tgravescs · 2021-10-07T12:56:37Z

generally stop documenting --jars as the plugin deploy option

Why would we do this? That is generally how you are supposed to distribute jars on spark, what is your alternative? this needs more explanation. for instance, on yarn you use --jars to get things into distributed cache to get sent to the nodes, without this you expect people to install it on every node?

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ShimLoader.scala

tgravescs · 2021-10-07T13:18:26Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ShimLoader.scala

@@ -135,7 +135,15 @@ object ShimLoader extends Logging {
    // org/apache/spark/serializer/KryoSerializer.scala#L134

    Option(SparkEnv.get)


nit, if changes are made to this file perhaps update function name to be more generic

gerashegalov · 2021-10-07T17:29:49Z

Why would we do this? That is generally how you are supposed to distribute jars on spark, what is your alternative? this needs more explanation. for instance, on yarn you use --jars to get things into distributed cache to get sent to the nodes, without this you expect people to install it on every node?

Good point. My main point that we should decide which of multiple ways to deploy plugin jars we recommend and support, and make sure that it works with all the features we provide such as the shuffle manager in all Spark deploy modes.

abellina · 2021-10-07T17:31:09Z

Taking a look at this patch with UCX

gerashegalov · 2021-10-07T17:46:57Z

While talking through the list of pros and cons with @tgravescs we concluded that my concern userClassPathFirst is moot with the following reasoning:
pre-21.10 the whole jar was visible to the ShimLoader's classloader, and we are not changing what classloader was chosen by Spark to load the shim. Thus by making the right sections of the jar accessible to the same class loader makes it only equivalent to the pre-21.10 setup.

So maybe having a reflection call into a non-public API is the only known downside at this moment.

abellina · 2021-10-07T17:47:16Z

Taking a look at this patch with UCX

It came up fine with and without earlyStart. I tried with the latest spark 3.2.1

gerashegalov added 10 commits October 6, 2021 01:02

Simplify shim classloader logic

ce248db

- uprev spark320 breaking DiskManager changes - use the Serializer instance to find mutable classloader - make the the update logic oblivious to the executor/driver side Signed-off-by: Gera Shegalov <[email protected]>

Restore ExecutorClassLoader treatment for userClassPathFirst

bd84e8b

Signed-off-by: Gera Shegalov <[email protected]>

Review comments

52de827

Signed-off-by: Gera Shegalov <[email protected]>

wip

3d09c48

wip

47c6725

Merge remote-tracking branch 'origin/branch-21.10' into improveClassl…

ca11f86

…oaderHack_k8s_for_Andy

fix forcing AppClassLoader update

6fac583

Signed-off-by: Gera Shegalov <[email protected]>

Undo doc changes

12577e7

more logging

821ff6a

Signed-off-by: Gera Shegalov <[email protected]>

gerashegalov added bug Something isn't working Spark 3.2+ labels Oct 7, 2021

gerashegalov added this to the Oct 4 - Oct 15 milestone Oct 7, 2021

gerashegalov self-assigned this Oct 7, 2021

undo run_pyspark_from_build.sh change

b418912

gerashegalov requested review from tgravescs and abellina October 7, 2021 01:17

tgravescs reviewed Oct 7, 2021

View reviewed changes

tgravescs approved these changes Oct 7, 2021

View reviewed changes

tgravescs merged commit 16fc3aa into NVIDIA:branch-21.10 Oct 7, 2021

gerashegalov deleted the forceParallelWorldInCallerClassloader branch October 7, 2021 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force parallel world in Shim caller's classloader #3763

Force parallel world in Shim caller's classloader #3763

gerashegalov commented Oct 7, 2021 •

edited

Loading

gerashegalov commented Oct 7, 2021

tgravescs commented Oct 7, 2021 •

edited by gerashegalov

Loading

tgravescs Oct 7, 2021

gerashegalov commented Oct 7, 2021

abellina commented Oct 7, 2021

gerashegalov commented Oct 7, 2021

abellina commented Oct 7, 2021

		@@ -135,7 +135,15 @@ object ShimLoader extends Logging {
		// org/apache/spark/serializer/KryoSerializer.scala#L134

		Option(SparkEnv.get)

Force parallel world in Shim caller's classloader #3763

Force parallel world in Shim caller's classloader #3763

Conversation

gerashegalov commented Oct 7, 2021 • edited Loading

gerashegalov commented Oct 7, 2021

tgravescs commented Oct 7, 2021 • edited by gerashegalov Loading

tgravescs Oct 7, 2021

Choose a reason for hiding this comment

gerashegalov commented Oct 7, 2021

abellina commented Oct 7, 2021

gerashegalov commented Oct 7, 2021

abellina commented Oct 7, 2021

gerashegalov commented Oct 7, 2021 •

edited

Loading

tgravescs commented Oct 7, 2021 •

edited by gerashegalov

Loading