-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZEPPELIN-5760] fix passing configs to spark in k8s #4398
Conversation
I think most Spark configuration values can be included in the environment variable |
Hi @Reamer I'm sorry but I am not sure if I understand you correctly. also this fix only changes behavior for spark only |
HI @zlosim Lines 323 to 325 in 8995ff2
Your configuration values like Line 410 in 8995ff2
You can also set In general, I can recommend you to create a config map that contains all Spark configurations and then include them in the containers via a custom interpreter-spec.yaml in your Spark Zeppelin interpreter. |
Hi @Reamer , thanks for the clarification.
Yes, it is running in client mode, but driver is not yet started so there is a possibility to pass them to the driver on start so they can have effect. We are doing the same in SparkInterpreterLauncher
I'm sorry I was not aware of that. I know we can set few params with env vars but I can't find how to set env var from interpreter configuration |
I understand your motivation. I just don't like the list of Spark configuration at this point, because it can become very long. Spark has very many configuration options and we can't map them all again here in Zeppelin. I am looking for a more generic approach. |
@zlosim SparkInterpreterLauncher supports all the spark configurations. It detects all the spark configuration and concatenate them via env variable The reason why we deprecate |
Hi @zjffdu thanks for the response. As for |
@zlosim How about updating buildSparkSubmitOptions |
Hi @zjffdu , sorry for the late response. I rewrote it so we are now using ZEPPELIN_SPARK_CONF. Please let me know if there is anything I can do better. |
...dard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have found a bug. Have you tested the changes locally?
...dard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java
Outdated
Show resolved
Hide resolved
...dard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java
Outdated
Show resolved
Hide resolved
...dard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java
Outdated
Show resolved
Hide resolved
...dard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java
Show resolved
Hide resolved
...dard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java
Outdated
Show resolved
Hide resolved
@zlosim Any update? |
Hi @zjffdu , I think this PR is ready to merge unless there are any other requests |
I would like to test this beforehand in my local K8s cluster, but I am still on vacation right now. Please have a little patience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have found a mistake. Let me check this. The purpose of the pipe is that the individual config values can be cleanly separated from other config values. A separation in the config values itself is not desired.
if (isUserImpersonated() && !StringUtils.containsIgnoreCase(userName, "anonymous")) { | ||
options.append(" --proxy-user ").append(userName); | ||
sparkConfSJ.add("--proxy-user"); | ||
sparkConfSJ.add(userName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible error.
This creates "--proxy-user|user". desired is "--proxy-user|user"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Reamer I'm not sure I can see the bug here, right now the output of this snippet is --proxy-user|user
and the pipe would be replaced in interpreter.sh with space
as far as i can see same code snippet can be found in SparkInterpreterLauncher
for (String key : properties.stringPropertyNames()) { | ||
String propValue = properties.getProperty(key); | ||
if (isSparkConf(key, propValue)) { | ||
sparkConfSJ.add("--conf"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the same error.
desired is "--conf spark=123|--conf spark.1=123"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to the previous one, in SparkInterpreterLauncher desired output is --conf|spark=123|--conf|spark.1=123
but can fix it to
--conf spark=123|--conf spark.1=123
if that is really what we want, please let me know
Hi @zlosim |
Hi @Reamer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thank you for your pull request, with this it is possible to remove the deprecated SPARK_SUBMIT option in the K8s launcher.
I have successfully tested your patch in my cluster.
As indicated, it might cause problems in the future if the pipe is also between "--conf" and the attachment. But since this is also done in the other start command, this should not block the merge.
* passing static arguments to spark-submit command so driver can pick them up * fixed static names * removed duplicate driver memory setting * fixed driver extra java opts * extend test * use ZEPPELIN_SPARK_CONF env var to pass spark configurations * fix import wildmark * fix separator * remove redundant concatenation * - remove redundant concatenation - fix tests
What is this PR for?
Some important spark configs can`t be passed via SparkConf in k8s and in client mode including spark.jars.packages and spark.driver.extraJavaOptions
This fix checks if these are set in interpreter configuration and passes them to spark-submit command
What type of PR is it?
Bug Fix
What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-5760
How should this be tested?
updated test to test this fix
Questions: