[BUG] minResources of volcano podGroup didn't take into account dynamicAllocation and memoryOverheadFactor #2244

kaka-zb · 2024-10-14T04:57:41Z

Description

We've been using spark operator and volcano for a long time in production env, however, there are some problems with the calculation of resource usage for volcano podGroup when the sparkapp is submitted.

The spark.dynamicAllocation.* & spark.kubernetes.memoryOverheadFactor params of spark are not taken into account when calculating memory of minResources for volcano podGroup. As a result, the calculated minResources maybe smaller than real usage of sparkapp, and the gang scheduling maybe fail.

✋ I have searched the open/closed issues and my issue is not listed.

Reproduction Code [Required]

Expected behavior

Actual behavior

Environment & Versions

Spark Operator App version: 2.0.1
Helm Chart Version: 2.0.1
Kubernetes Version: 1.25.7
Apache Spark version: 3.4.3

Additional context

kaka-zb · 2024-10-14T04:58:02Z

BTW, i see that resourceusage directory implemented in yunikorn, and if you have no plan to support this for volcano, I can contribute our code for volcano, which has been verified by thousands times of spark task.

jacobsalway · 2024-10-16T11:14:09Z

Hey, I wrote the resourceusage module for the Yunikorn batch scheduler. When I implemented this initially we discussed pulling these functions out into a more generic module for use across other batch schedulers. If you have code that also calculates the resulting pod resource fields I'd be happy to review and hopefully improve the existing solution and update the existing Volcano batch scheduler.

kaka-zb · 2024-10-16T12:10:03Z

@jacobsalway Thanks for reply, i will submit a draft PR and then you can reveiw and then see if that helps.

kaka-zb mentioned this issue Oct 18, 2024

Draft: fix the minResources calculation logic of podgroup for volcano #2262

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] minResources of volcano podGroup didn't take into account dynamicAllocation and memoryOverheadFactor #2244

[BUG] minResources of volcano podGroup didn't take into account dynamicAllocation and memoryOverheadFactor #2244

kaka-zb commented Oct 14, 2024

kaka-zb commented Oct 14, 2024 •

edited

Loading

jacobsalway commented Oct 16, 2024 •

edited

Loading

kaka-zb commented Oct 16, 2024

[BUG] minResources of volcano podGroup didn't take into account dynamicAllocation and memoryOverheadFactor #2244

[BUG] minResources of volcano podGroup didn't take into account dynamicAllocation and memoryOverheadFactor #2244

Comments

kaka-zb commented Oct 14, 2024

Description

Reproduction Code [Required]

Expected behavior

Actual behavior

Environment & Versions

Additional context

kaka-zb commented Oct 14, 2024 • edited Loading

jacobsalway commented Oct 16, 2024 • edited Loading

kaka-zb commented Oct 16, 2024

kaka-zb commented Oct 14, 2024 •

edited

Loading

jacobsalway commented Oct 16, 2024 •

edited

Loading