-
Notifications
You must be signed in to change notification settings - Fork 53
Pass K8sPluginConfig to spark driver and executor pods #patch #271
Conversation
Thank you for opening this pull request! 🙌 These tips will help get your PR across the finish line:
|
Signed-off-by: fg91 <[email protected]>
Signed-off-by: fg91 <[email protected]>
6b0c825
to
87988c9
Compare
cc @hamersaw |
Codecov Report
@@ Coverage Diff @@
## master #271 +/- ##
==========================================
+ Coverage 62.97% 63.37% +0.39%
==========================================
Files 142 145 +3
Lines 8970 9324 +354
==========================================
+ Hits 5649 5909 +260
- Misses 2799 2872 +73
- Partials 522 543 +21
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: fg91 <[email protected]>
Signed-off-by: fg91 <[email protected]>
22b3438
to
59905ec
Compare
Filed an issue for this so we can better track. Hopefully will improve visibility. |
@fg91 I think we discussed scoping this PR down to just applying the existing configuration? We can push upgrading the k8s-on-spark-operator once we have out-of-core plugins implemented (on the roadmap for 1.3 release). Do you have any bandwidth to make these changes? Otherwise we may be able to contribute as well. |
4816c17
to
46244c2
Compare
Signed-off-by: Fabio Grätz <[email protected]>
Signed-off-by: Fabio Grätz <[email protected]>
Signed-off-by: Fabio Grätz <[email protected]>
Signed-off-by: Fabio Grätz <[email protected]>
ec1146f
to
e5873ed
Compare
@hamersaw I removed the commits in which I tried to upgrade K8s and Spark-on-K8s. Then I compared the custom SparkPodSpec from the currently pinned version with Flyte’s K8sPluginConfig again and additionally carried over I'm interested in your opinion how you would treat:
In principle one could map them to In a Spark task in flyte one typically configures the resources for the ephemeral spark cluster this way (source): @task(
task_config=Spark(
# this configuration is applied to the spark cluster
spark_conf={
"spark.driver.memory": "1000M",
"spark.driver.cores": "1",
}
),
)
def hello_spark(partitions: int) -> float: Considering @task(
task_config=Spark(
# this configuration is applied to the spark cluster
spark_conf={
"spark.driver.memory": "1000M",
"spark.driver.cores": "1",
}
),
requests=Resources( # <-- new
mem="1G",
),
)
def hello_spark(partitions: int) -> float: If you agree, I'd propose to not carry the above mentioned values over and mark this PR ready for review. |
Signed-off-by: Fabio Grätz <[email protected]>
3afefec
to
ebdd8ac
Compare
I totally agree. The |
It looks like in other k8s resources setting the task to |
Signed-off-by: Fabio Grätz <[email protected]>
a1678d3
to
63fd182
Compare
Signed-off-by: Fabio Grätz <[email protected]>
fea047a
to
d843ed8
Compare
|
Thanks for correcting me on the current I absolutely agree. Let's go ahead with the updates you proposed. I think this is the last issue right? |
Signed-off-by: Fabio Grätz <[email protected]>
7099d14
to
7884de7
Compare
Congrats on merging your first pull request! 🎉 |
* Pass default tolerations to spark driver and executor Signed-off-by: fg91 <[email protected]> * Test passing default tolerations to spark driver and executor Signed-off-by: fg91 <[email protected]> * Pass scheduler name to driver and executor SparkPodSpec Signed-off-by: fg91 <[email protected]> * Carry DefaultNodeSelector from k8s plugin config to SparkPodSpec Signed-off-by: fg91 <[email protected]> * Carry over EnableHostNetworkingPod Signed-off-by: Fabio Grätz <[email protected]> * Test carrying over of default env vars Signed-off-by: Fabio Grätz <[email protected]> * Carry over DefaultEnvVarsFromEnv Signed-off-by: Fabio Grätz <[email protected]> * Carry over DefaultAffinity Signed-off-by: Fabio Grätz <[email protected]> * Doc behaviour of default and interruptible NodeSelector and Tolerations Signed-off-by: Fabio Grätz <[email protected]> * Don't carry over default env vars from env and fix test Signed-off-by: Fabio Grätz <[email protected]> * Lint Signed-off-by: Fabio Grätz <[email protected]> * Apply node selector requirement to pod affinity Signed-off-by: Fabio Grätz <[email protected]> Signed-off-by: fg91 <[email protected]> Signed-off-by: Fabio Grätz <[email protected]> Co-authored-by: Fabio Grätz <[email protected]>
TL;DR
Currently, when running Spark tasks, some of the k8s plugin config values configured in the helm values (such as for instance
DefaultTolerations
,NodeSelector
,HostNetwork
,SchedulerName
, ...) are not carried over to the SparkApplication and, thus, not to the driver and executor pods.In my specific case this is limiting because we run Flyte itself and the Spark operator on cheap nodes while giving workflows the ability to start high-powered nodes via default tolerations.
This PR fixes this issue.
Type
Are all requirements met?
Complete description
Comparing the custom SparkPodSpec with Flyte’s K8sPluginConfig shows that the following configurations are already carried over to the
SparkPodSpec
of the driver and the executor:DefaultAnnotations
DefaultLabels
InterruptibleTolerations
DefaultNodeSelector
andInterruptibleNodeSelector
DefaultPodSecurityContext
(calledSecurityContext
inSparkPodSpec
)DefaultPodDNSConfig
(calledDNSConfig
inSparkPodSpec
)This PR adds logic to carry over the following configurations:
DefaultTolerations
SchedulerName
DefaultNodeSelector
EnableHostNetworkingPod
DefaultEnvVarsFromEnv
DefaultAffinity
Follow-up issue
The Spark operator passes the tolerations from the
SparkApplication
along to the pods only if the operator itself is installed with the--set webhook.enable=true
value to activate Mutating Admission Webhooks. I feel I should document this somewhere. Should I make a PR to note this here or would you recommend another place?