-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force parallel world in Shim caller's classloader #3763
Force parallel world in Shim caller's classloader #3763
Conversation
- uprev spark320 breaking DiskManager changes - use the Serializer instance to find mutable classloader - make the the update logic oblivious to the executor/driver side Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
…oaderHack_k8s_for_Andy
Signed-off-by: Gera Shegalov <[email protected]>
- making this option default because it's equivalent to the old flat jar - making it optional because we still debug how we miss the non-default classloader in #NVIDIA#3704 and it's not the right behavior for addJar with userClassPathFirst. However, I think we shoudl generally stop documenting --jars as the plugin deploy option Fixes NVIDIA#3704 Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
build |
Why would we do this? That is generally how you are supposed to distribute jars on spark, what is your alternative? this needs more explanation. for instance, on yarn you use --jars to get things into distributed cache to get sent to the nodes, without this you expect people to install it on every node? |
@@ -135,7 +135,15 @@ object ShimLoader extends Logging { | |||
// org/apache/spark/serializer/KryoSerializer.scala#L134 | |||
|
|||
Option(SparkEnv.get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, if changes are made to this file perhaps update function name to be more generic
Good point. My main point that we should decide which of multiple ways to deploy plugin jars we recommend and support, and make sure that it works with all the features we provide such as the shuffle manager in all Spark deploy modes. |
Taking a look at this patch with UCX |
While talking through the list of pros and cons with @tgravescs we concluded that my concern So maybe having a reflection call into a non-public API is the only known downside at this moment. |
It came up fine with and without |
This PR codes up the approach that was tested on Databricks as an alternative while working on #3756
Fixes #3704
Signed-off-by: Gera Shegalov [email protected]