Pod spec equivalency checks can break Cluster Autoscaler scalability #4724

towca · 2022-03-04T18:39:33Z

The logic in buildPodEquivalenceGroups and filterOutSchedulable groups pods by their scheduling requirements, as a scalability optimization. This is done by first grouping by the controller UID, and then comparing pod specs for pods from one controller. If there's something in the pod spec that's unique to a single pod within a controller, every pod ends up in a group of its own, and the optimization breaks.

In extreme cases when there are a lot of such pods (a couple thousand can be enough), CA can spend such a long time in one loop that it fails health-checks and is killed by kubelet. Then everything repeats once it gets back up, and CA is effectively broken until the pods are scheduled or deleted.

One trigger for pod specs being different is the BoundServiceAccountTokenVolume feature, which injects uniquely-named projected volumes into each pod's spec. This was taken into account by CA in #4441.

We've just run into another one - Jobs using completionMode: Indexed. In this mode, each pod gets a unique, indexed hostname in its spec. This is documented here: https://kubernetes.io/docs/concepts/workloads/controllers/job/#completion-mode. AFAIU the hostname shouldn't affect scheduling, so sanitizing it in PodSpecSemanticallyEqual should be enough to fix this particular issue.

However, this approach of "fixing" single fields as issues pop up doesn't scale very well. We should come up with a more generic solution to these kinds of problems. One idea could be having a cutoff for the number of groups within one controller, proposed in #4441 (comment).

The text was updated successfully, but these errors were encountered:

towca added kind/bug Categorizes issue or PR as related to a bug. area/cluster-autoscaler labels Mar 4, 2022

x13n mentioned this issue Mar 14, 2022

Limit caching pods per owner reference #4735

Merged

k8s-ci-robot closed this as completed in #4735 Mar 16, 2022

x13n mentioned this issue Mar 16, 2022

Skip pod hostname when comparing PodSpecs #4742

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod spec equivalency checks can break Cluster Autoscaler scalability #4724

Pod spec equivalency checks can break Cluster Autoscaler scalability #4724

towca commented Mar 4, 2022 •

edited

Loading

Pod spec equivalency checks can break Cluster Autoscaler scalability #4724

Pod spec equivalency checks can break Cluster Autoscaler scalability #4724

Comments

towca commented Mar 4, 2022 • edited Loading

towca commented Mar 4, 2022 •

edited

Loading