Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster autoscaler 1.19.2 nr1 #4

Open
wants to merge 72 commits into
base: master
Choose a base branch
from

Conversation

mytunguyen
Copy link

@mytunguyen mytunguyen commented Dec 30, 2021

branched off of https://github.com/kubernetes/autoscaler/tree/cluster-autoscaler-1.19.2

cherry picked ef38ce6

i probably could have used hack/update-vendor.sh from master. to get this one running on your mac

brew install bash
brew install [email protected]
brew install findutils #for xargs
# edit hack/update-vendor.sh because the flags don't work
# K8S_REV="v1.19.2"
# OVERRIDE_GO_VERSION="true"
/usr/local/bin/bash ./hack/update-vendor.sh  # use your new bash
CGO_ENABLED=0 GO111MODULE=off GOOS=linux GOARCH=amd64 go build -o cluster-autoscaler-amd64
docker build -t cluster-autoscaler:v1.19.2-nr1 -f Dockerfile.amd64 .

bpineau and others added 30 commits August 23, 2020 11:02
When `cloudProviderBackoff` is configured, `cloudProviderBackoffRetries`
must also be set to a value > 0, otherwise the cluster-autoscaler
will instanciate a vmssclient with 0 Steps retries, which will
cause `doBackoffRetry()` to return a nil response and nil error on
requests. ARM client can't cope with those and will then segfault.
See kubernetes/kubernetes#94078

The README.md needed a small update, because the documented defaults
are a bit misleading: they don't apply when the cluster-autoscaler
is provided a config file, due to:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_manager.go#L299-L308
... which is also causing all environment variables to be ignored
when a configuration file is provided.
…-release-1.19

Cherry-pick onto 1.19: Backoff needs retries
k8s Azure clients keeps tracks of previous HTTP 429 and Retry-After cool
down periods. On subsequent calls, they will notice the ongoing throttling
window and will return a synthetic errors (without HTTPStatusCode) rather
than submitting a throttled request to the ARM API:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/vmssvmclient/azure_vmssvmclient.go#L154-L158
https://github.com/kubernetes/autoscaler/blob/a5ed2cc3fe0aabd92c7758e39f1a9c9fe3bd6505/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/retry/azure_error.go#L118-L123

Some CA components can cope with a temporarily outdated object view
when throttled. They call in to `isAzureRequestsThrottled()` on clients
errors to return stale objects from cache (if any) and extend the object's
refresh period (if any).

But this only works for the first API call (returning HTTP 429).
Next calls in the same throttling window (per Retry-After header)
won't be identified as throttled by `isAzureRequestsThrottled` due
to their nul `HTTPStatusCode`.

This can makes the CA panic during startup due a failing cache init, when
more than one VMSS call hits throttling. We've seen this causing early
restarts loops, re-scanning every VMSS due to cold cache on start,
keeping the subscription throttled.

Practically this change allows the 3 call sites (`scaleSet.Nodes()`,
`scaleSet.getCurSize()`, and `AgentPool.getVirtualMachinesFromCache()`) to
serve from cache (and extend the object's next refresh deadline) as they
would do on the first HTTP 429 hit, rather than returning an error.
`fetchAutoAsgs()` is called at regular intervals, fetches a list of VMSS,
then call `Register()` to cache each of those. That registration function
will tell the caller wether that vmss' cache is outdated (when the provided
VMSS, supposedly fresh, is different than the one held in cache) and will
replace existing cache entry by the provided VMSS (which in effect will
require a forced refresh since that ScaleSet struct is passed by
fetchAutoAsgs with a nil lastRefresh time and an empty instanceCache).

To detect changes, `Register()` uses an `reflect.DeepEqual()` between the
provided and the cached VMSS. Which does always find them different: cached
VMSS were enriched with instances lists (while the provided one is blank,
fresh from a simple vmss.list call). That DeepEqual is also fragile due to
the compared structs containing mutexes (that may be held or not) and
refresh timestamps, attributes that shoudln't be relevant to the comparison.

As a consequence, all Register() calls causes indirect cache invalidations
and a costly refresh (VMSS VMS List). The number of Register() calls is
directly proportional to the number of VMSS attached to the cluster, and
can easily triggers ARM API throttling.

With a large number of VMSS, that throttling prevents `fetchAutoAsgs` to
ever succeed (and cluster-autoscaler to start). ie.:

```
I0807 16:55:25.875907     153 azure_scale_set.go:344] GetScaleSetVms: starts
I0807 16:55:25.875915     153 azure_scale_set.go:350] GetScaleSetVms: scaleSet.Name: a-testvmss-10, vmList: []
E0807 16:55:25.875919     153 azure_scale_set.go:352] VirtualMachineScaleSetVMsClient.List failed for a-testvmss-10: &{true 0 2020-08-07 17:10:25.875447854 +0000 UTC m=+913.985215807 azure cloud provider throttled for operation VMSSVMList with reason "client throttled"}
E0807 16:55:25.875928     153 azure_manager.go:538] Failed to regenerate ASG cache: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled"
F0807 16:55:25.875934     153 azure_cloud_provider.go:167] Failed to create Azure Manager: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled"
goroutine 28 [running]:
```

From [`ScaleSet` struct attributes](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_scale_set.go#L74-L89)
(manager, sizes, mutexes, refreshes timestamps) only sizes are relevant
to that comparison. `curSize` is not strictly necessary, but comparing it
will provide early instance caches refreshs.
Cherry pick kubernetes#3437 onto 1.19 - Avoid unwanted VMSS VMs caches invalidations
Cherry pick kubernetes#3484 onto 1.19: Serve stale on ongoing throttling
On (re)start, cluster-autoscaler will refresh all VMSS instances caches
at once, and set those cache TTL to 5mn. All VMSS VM List calls (for VMSS
discovered at boot) will then continuously hit ARM API at the same time,
potentially causing regular throttling bursts.

Exposing an optional jitter subtracted from the initial first scheduled
refresh delay will splay those calls (except for the first one, at start),
while keeping the predictable (max. 5mn, unless the VMSS changed) refresh
interval after the first refresh.
Cherry pick kubernetes#3440 onto 1.19 - optional jitter on initial VMSS VM cache refresh
[cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured
[cluster-autoscaler][clusterapi] Add support for node autodiscovery to clusterapi provider
[cluster-autoscaler] Support using --cloud-config for clusterapi provider
Update group identifier to use for Cluster API annotations
Replacing the module path in go.mod did not solve
the issue preventing hack/update-vendor.sh from
running properly, so it will have to be deleted.
Cherry pick kubernetes#3558 onto 1.19 - Add missing stable labels in the azure template
[CA-1.19] CAPI backports for autoscaling workload clusters
…r-delete-fix

Remove ScaleDownNodeDeleted status since we no longer delete nodes synchronously
Improve Cluster API tests to work better with constrained resources
Merge pull request kubernetes#3570 from towca/jtuznik/scale-down-after-delete-fix
…32-1.19

Cherry-pick kubernetes#3532 onto 1.19: Azure: support allocatable resources overrides via VMSS tags
Signed-off-by: Marques Johansson <[email protected]>
k8s-ci-robot and others added 24 commits May 17, 2021 07:02
…lease-1.19

[cluster-autoscaler] Backporting kubernetes#3545 to release 1.19
Cherry pick 3308 onto 1.19 - Fix priority expander falling back to random although higher priority matches
…lps load balancer to remove the node from healthy hosts (ALB does have this support).

This won't fix the issue of 502 completely as there is some time node has to live even after cordoning as to serve In-Flight request but load balancer can be configured to remove Cordon nodes from healthy host list.
This feature is enabled by cordon-node-before-terminating flag with default value as false to retain existing behavior.
cherry pick kubernetes#3649 - Adding functionality to cordon the node before destroying it.
While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports
fetching 100 ASG per calls on all regions, matching what's documented:
https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html
```
     AutoScalingGroupNames.member.N
       The names of the Auto Scaling groups.
       By default, you can only specify up to 50 names.
       You can optionally increase this limit using the MaxRecords parameter.
     MaxRecords
       The maximum number of items to return with this call.
       The default value is 50 and the maximum value is 100.
```

Doubling this halves API calls on large clusters, which should help to prevent throttling.
Refactor to allow for optimisation
The pricing json for us-east-1 is currently 129MB. Currently fetching
this into memory and parsing results in a large memory footprint on
startup, and can lead to the autoscaler being OOMKilled.

Change the ReadAll/Unmarshal logic to a stream decoder to significantly
reduce the memory use.
…pick-of-#3999-kubernetes#4199-upstream-cluster-autoscaler-release-1.19

Automated cherry pick of kubernetes#3999 kubernetes#4127 kubernetes#4199 upstream cluster autoscaler release 1.19
Backport Merge pull request kubernetes#4274 to upstream/cluster-autoscaler-release-1.19
Cherry-pick kubernetes#4130: dont proactively decrement azure cache for unregistered nodes
* Cluster Autoscaler: backport Github Actions CI to 1.19

* Cluster Autoscaler: fix files not passing gofmt on 1.19

Github Actions CI haven't been running on 1.19, so we merged some
changes that are not passing gofmt.
Fix 1.19 after cherry-picking Azure change
@trutx
Copy link

trutx commented Jan 12, 2022

I think this PR needs to be rebased to cluster-autoscaler-release-1.19. This should fix all the conflicts.

Also to follow the standards and the previous work, could we rename this branch to cluster-autoscaler-release-1.19-nr1?

@trutx
Copy link

trutx commented Jan 12, 2022

Might be interesting to port kubernetes#3940 as well, just like we did in #2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.