diff --git a/content/en/blog/_posts/2022-12-01-scalable-job-tracking-ga/index.md b/content/en/blog/_posts/2022-12-01-scalable-job-tracking-ga/index.md new file mode 100644 index 0000000000000..ed84353860d8d --- /dev/null +++ b/content/en/blog/_posts/2022-12-01-scalable-job-tracking-ga/index.md @@ -0,0 +1,62 @@ +--- +layout: blog +title: "Scalable Job tracking goes GA to support massively parallel batch workloads" +date: 2022-12-01 +slug: "scalable-job-tracking-ga" +--- + +**Authors:** Aldo Culquicondor (Google) + +The Kubernetes 1.26 release includes a stable implementation of the Job +controller that can reliably track a large amount of Jobs with high levels of +parallelism. SIG Apps have worked on this foundational improvement, under +the feature gate `JobTrackingWithFinalizers`, since Kubernetes 1.22. After +multiple iterations and scale verifications, this is now the default +implementation of the Job controller. + +Paired with the Indexed [completion mode](/docs/concepts/workloads/controllers/job/#completion-mode), +the Job controller can handle massively parallel batch Jobs, supporting up to +100 thousand concurrent Pods. + +The new implementation also enables better control over failures and retries, +such as [Pod failure policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy) + +## The problem with the legacy implementation + +Generally, Kubernetes workload controllers, such as ReplicaSet or StatefulSet +rely on the existence of Pods or other objects in the API to determine the +status of the Workload and whether replacements are needed. +For example, if a Pod that belonged to a ReplicaSet terminates or ceases to +exist, the ReplicaSet controller needs to create a replacement Pod to satisfy +the `.spec.replicas` field. + +Since its inception, the Job controller also relied on the existence on Pods +to track Job status. However, a Job has [completion](/docs/concepts/workloads/controllers/job/#completion-mode) +and [failure handling](/docs/concepts/workloads/controllers/job/#handling-pod-and-container-failures) +policies. The Job controller needs to know under which circumstances a Pod +finished to determine whether to create a replacement Pod or mark the Job as +completed or failed. As a result, the Job +controller depended on Pods, even terminated ones, to remain in the API in order +to keep track of the status. + +This dependency made the tracking of Job status unreliable, because Pods can be +deleted from the API for a number of reasons, including: +- The garbage collector removing orphan Pods when a Node goes down. +- The garbage collector removing terminated Pods when they reach a treshhold. +- The Kubernetes scheduler preempting a Pod to accomodate higher priority Pods. +- The taint manager evicting a Pod that doesn't tolerate a `NoExecute` taint. +- External controllers, not included as part of Kubernetes, or humans deleting + Pods. + +## How did we solve the problem? + + + +## How do I use this feature? + + + +## Deprecation notices + +In Kubernetes 1.21, we [introduced Indexed Jobs](/blog/2021/04/19/introducing-indexed-jobs/) +as a simple way of setting up parallel Jobs.