Cluster Autoscaler 1.29.0
MaciekPytel
released this
27 Dec 17:25
·
135 commits
to cluster-autoscaler-release-1.29
since this release
Deprecations
--ignore-taint
flag andignore-taint.cluster-autoscaler.kubernetes.io/
taint prefix are now deprecated. Instead use:--status-taint
flag orstatus-taint.cluster-autoscaler.kubernetes.io/
taint prefix for taints that denote node status.--startup-taint
flag orstartup-taint.cluster-autoscaler.kubernetes.io/
taint prefix for taints that are used to prevent pods from scheduling before node is fully initialized (e.g. when using daemonset to install device plugin).- For backward compatibility
--ignore-taint
flag andignore-taint.cluster-autoscaler.kubernetes.io/
continues to work with behavior identical to startup taint (which is the same behavior it had before). - Please see FAQ for more details. - #6132, #6218
- Flags that were unused in the code (i.e. setting them had no effect) were deprecated and will be removed in the future release. Affected flags are:
--node-autoprovisioning-enabled
and--max-autoprovisioned-node-group-count
.
General
- Adds new flag
--bypassed-scheduler-names
with default empty value to maintain original behaviour.
If flag is set to non-empty list, CA will not wait for schedulers (listed in the flag value) to mark pods as unschedulable and will evaluate non processed pods. Furthermore, if bypassed schedulers are non-empty CA will not wait for pods to reach a certain age to scale-up, effectively ignoringunschedulablePodTimeBuffer
- #6235- Enabling this feature can improve autoscaling latency (CA will react to pods faster), but it can also increase load on CA in case of very large scale-ups (thousands of pending pods). This is because limited scheduler throughput can effectively act as a rate limiter, protecting CA from having to process a scale-up of too many pods at the same time. We believe this change will be beneficial in vast majority of environments, but given that CA scalability varies greatly between cloud providers we recommend testing this feature before enabling it in large clusters.
- A new flag (
--drain-priority-config
) is introduced which allows users to configure drain behavior during scale-down based on pod priority. The new flag is mutually exclusive with--max-graceful-termination-sec
.--max-graceful-termination-sec
can still be used if the new configuration options are not needed. The default behavior is preserved (simple config, default value of--max-graceful-termination-sec
). - #6139 - Added --dynamic-node-delete-delay-after-taint-enabled flag. Enabling this flag changes delay between tainting and draining a node from constant delay to a dynamic one based on Kubernetes api-server latency. This minimizes the risk of race conditions if api-server connection is slow and improves scale-down throughput when it's fast. - #6019
- Add structured logging support via
--logging-format json
- #6035 - Introduced a new
node_group_target_count
metric that keeps track of target sizes of each NodeGroup. This metric is only available if--emit-per-nodegroup-metrics
flag is enabled. - #6361 - Introduced a new
node_taints_count
metric tracking different types of taints in the cluster. - #6201 - New command line option
--kube-api-content-type
is added to specify content type to communicate with apiserver. This option also changes default content type from "application/json" to "application/vnd.kubernetes.protobuf". - #6114 - Fixed a bug where resource requests of restartable init containers were not included in utilization calculation. - #6225
- Fixed a bug where CA might have created less nodes than desired with a message about "Capping binpacking after exceeding threshold of 4 nodes" even though it then didn't actually add four new nodes. - #6165
- Fixed support for
--feature-gates=ContextualLogging=true
. - #6162 - Fixed a bug where scale down may have failed with "daemonset.apps not found". - #6122
- Optimized CA memory usage. - #6159, #6110
- Disambiguated wording in the log messages related to node removal ineligibility caused by high resource allocation. - #6223
- Pods with the "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation will now always report an annotation-related warning message if they block scale-down (where previously they might've reported e.g. a message about not being replicated).- #6077
AWS
- Added c7a, r7i, mac2-m2 families and size in i4i, c7i.metal, r7a.metal, r7iz.metal for Amazon EC2 instances static list. - #6347
- Added p5.48xlarge - #6131
- Updated cloudprovider/aws/aws-sdk-go to 1.48.7 in order to support dynamic auth token. - #6325
- Fixed an issue where the capacityType label inferred from an empty AWS ManagedNodeGroup does not match the same label on the nodes after it scales from 0 -> 1. - #6261
- Introduced caching to reduce volume of DescribeLaunchTemplateVersions API calls made by Cluster Autoscaler. - #6245
- Nodes annotated with
k8s.io/cluster-autoscaler-enabled=false
will be skipped by CA and would no longer produce spammy logs about missing AWS instances. - #6265, #6301 - Added additional log output when updating the ASG information form AWS. - #6282
- Fixed a bug where CA may tried to remove an instance that was already in Terminated state. - #6166
- Scale up from 0 now working with existing AWS EBS CSI PersistentVolume without having to add tag to ASG. - #6090
Azure
- Removed AKS vmType. - #6186
Civo
- Introduced support for scaling NodeGroup from 0. - #6322
Cluster API
- Users of Cluster API can override the default architecture to consider in the templates for autoscaling from zero so that pods requesting non-amd64 nodes in their node selector terms can trigger the scale-up in non-amd64 single-arch clusters. - #6066
Equinix Metal
- The packet provider and its configuration parameters are now deprecated in favor of equinixmetal - #6085
- The cluster-autoscaler
--cloud-provider
flag should now be set to equinixmetal. For backward compatibility, "--cloud-provider=packet" continues to work - "METAL_AUTH_TOKEN" replaces "PACKET_AUTH_TOKEN". For backward compatibility, the latter still works.
- "EQUINIX_METAL_MANAGER" replaces "PACKET_MANAGER". For backward compatibility, the latter still works.
- Each node managed by cloud-provider "equinixmetal" will be labeled with the "METAL_CONTROLLER_NODE_IDENTIFIER_LABEL" defined label. For backward compatibility, "PACKET_CONTROLLER_NODE_IDENTIFIER_LABEL" still works.
- The cluster-autoscaler
- We now use metros in the Equinix Metal (Packet) cloudprovider. Facilities support has been removed. - #6078
GCE
- Flag
--gce-expander-ephemeral-storage-support
is now Deprecated. The ephemeral-storage support is always enabled and the flag itself would be ignored. - Support for paginated MIG instance listing. - #6376
- Improved reporting of errors related to GCE Reservations. - #6093
gRPC
- Timeout of grpc calls can be specified through
cloud-config
. - #6373 - grpc based cloud providers can now pass the grpc error code 12, Unimplemented, to signal they do not implement optional methods. - #5937
- Fixed: cluster-autoscaler thinks newly scaled up nodegroup using externalgrpc provider has
MaxNodeProvisionTime
set as 0 seconds and expects the new node to be registered in 0-10 seconds instead of the default 15m. Check #5935 for more info. - #5936
Hetzner
- Fixed a bug where failed servers are kept for longer than necessary. - #6364
- Fixed a bug where too many requests are sent to the Hetzner Cloud API, causing Rate Limit issues. - #6308
- Each node pool can now have different init configs. - #6184
Kwok
- Introduced new kwok cloud provider (check https://github.com/kubernetes/autoscaler/blob/kwok-poc/cluster-autoscaler/cloudprovider/kwok/README.md) for more info.
Images
registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
registry.k8s.io/autoscaling/cluster-autoscaler-arm64:v1.29.0
registry.k8s.io/autoscaling/cluster-autoscaler-amd64:v1.29.0
registry.k8s.io/autoscaling/cluster-autoscaler-s390x:v1.29.0