SUP-3258 - Implement .spec.job.activeDeadlineSeconds #497

petetomasik · 2025-02-06T14:59:58Z

Implements .spec.job.activeDeadlineSeconds as a means of setting runtime boundaries for Kubernetes jobs that are unable to complete within a "reasonable" amount of time.

This is configured with a default value of 21600 seconds (6 hours) in the controller configuration. This value can be overridden by the user via the command line --job-active-deadline-seconds 43200, directly in the controller configuration job-active-deadline-seconds: 43200, or on a per Buildkite job basis via the kubernetes plugin:

steps:
  - label: "12 hour nap, override default 6 hour job-active-deadline-seconds config"
    command: sleep 43200
    plugins:
      - kubernetes:
          jobActiveDeadlineSeconds: 46800

At this time .spec.job.activeDeadlineSeconds will not be able to be disabled, as we agree that a job runtime boundary should be defined at all times.

Fixes #480
Fixes #479
Fixes #463

DrJosh9000

LGTM

Other than one comment below, ideally this would also have a test. The easiest way currently is probably an integration test that uses a pipeline YAML fixture with a small activeDeadlineSeconds and a command that sleeps longer than it. But it may be tricky to take image pull time into account, so the deadline and sleep would probably be at least 30 seconds to avoid flakiness.

DrJosh9000 · 2025-02-11T22:50:19Z

internal/controller/scheduler/scheduler.go

@@ -75,6 +76,7 @@ func New(logger *zap.Logger, client kubernetes.Interface, cfg Config) *worker {
 }

 type KubernetesPlugin struct {
+	Job               *batchv1.Job           `json:"job,omitempty"`


Using *batchv1.Job here could get confusing - someone might try to write:

plugins: - kubernetes: job: template: spec: containers: ...

which wouldn't be rejected for any failure to parse, and the user might expect that defining containers that way would work but instead it would be silently ignored.

(Maybe one day the whole KubernetesPlugin should be changed to revolve around a *batchv1.Job instead of defining pieces of it. Needs thought.)

In the meantime I would suggest either just adding JobActiveDeadlineSeconds as a KubernetesPlugin field directly, or maybe wrapping it in a new Job struct that can be grown over time to include other job parameters.

This is step 1 of a long journey to fully transition to *batchv1.Job. I went back and forth on whether to implement it this way and possibly cause confusion for users that try to leverage the entire Job spec. There are additional tickets we have that will continue to build off of the Job spec (backoffLimit, podFailurePolicy, etc.) bit by bit. We could implement this (and the future additional fields) as fields in the plugin and eventually bring them back into the full Job spec when the time comes to more precisely control the feature set.

DrJosh9000

LGTM ✅

petetomasik requested a review from a team as a code owner February 6, 2025 14:59

petetomasik requested a review from DrJosh9000 February 11, 2025 19:01

DrJosh9000 approved these changes Feb 11, 2025

View reviewed changes

petetomasik marked this pull request as draft February 12, 2025 15:05

petetomasik added 2 commits February 12, 2025 16:59

Implement job.spec.activeDeadlineSeconds, default to 6h

b8801b4

Add test for activeDeadlineSeconds override, exceeded

7a13241

petetomasik force-pushed the SUP-3258-job-active-deadline-seconds branch from fd05068 to 7a13241 Compare February 12, 2025 22:00

petetomasik marked this pull request as ready for review February 12, 2025 22:21

petetomasik requested a review from DrJosh9000 February 12, 2025 22:21

DrJosh9000 approved these changes Feb 13, 2025

View reviewed changes

petetomasik merged commit 835cc76 into main Feb 13, 2025
1 check passed

petetomasik deleted the SUP-3258-job-active-deadline-seconds branch February 13, 2025 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SUP-3258 - Implement .spec.job.activeDeadlineSeconds #497

SUP-3258 - Implement .spec.job.activeDeadlineSeconds #497

petetomasik commented Feb 6, 2025 •

edited

Loading

DrJosh9000 left a comment

DrJosh9000 Feb 11, 2025

petetomasik Feb 12, 2025

DrJosh9000 left a comment

SUP-3258 - Implement .spec.job.activeDeadlineSeconds #497

SUP-3258 - Implement .spec.job.activeDeadlineSeconds #497

Conversation

petetomasik commented Feb 6, 2025 • edited Loading

DrJosh9000 left a comment

Choose a reason for hiding this comment

DrJosh9000 Feb 11, 2025

Choose a reason for hiding this comment

petetomasik Feb 12, 2025

Choose a reason for hiding this comment

DrJosh9000 left a comment

Choose a reason for hiding this comment

petetomasik commented Feb 6, 2025 •

edited

Loading