Skip to content

Commit

Permalink
Sync with upstream v1.28.0 (#260)
Browse files Browse the repository at this point in the history
* feat(*): add more metrics

* Added the RBAC Permission to Linode.

* CA - Correct Cloudprovider PR labelling to area/provider/<provider name>

* fix(volcengine): don't build all provider when volcengine tag exist

* chore: replace `github.com/ghodss/yaml` with `sigs.k8s.io/yaml`

At the time of making this commit, the package `github.com/ghodss/yaml`
is no longer actively maintained.

`sigs.k8s.io/yaml` is a permanent fork of `ghodss/yaml` and is actively
maintained by Kubernetes SIG.

Signed-off-by: Eng Zer Jun <[email protected]>

* Fixed Typo and Trailing-whitespace

* Skip healthiness check for non-existing similar node groups

* BinpackingLimiter interface

* fix comment and list format

* add more logging for balancing similar node groups

this change adds some logging at verbosity levels 2 and 3 to help
diagnose why the cluster-autoscaler does not consider 2 or more node
groups to be similar.

* Update VPA scripts to use v1.

* fix: don't clean `CriticalAddonsOnly` taint from template nodes
- this taint leads to unexpected behavior
- users expect CA to consider the taint when autoscaling

Signed-off-by: vadasambar <[email protected]>

* Updated the owners of civo cloudprovider

Signed-off-by: Vishal Anarse <[email protected]>

* Bump golang from 1.20.4 to 1.20.5 in /vertical-pod-autoscaler/builder

Bumps golang from 1.20.4 to 1.20.5.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* cluster-autoscaler: support Brightbox image pattern

cluster-autoscaler/cloudprovider/brightbox

Allow scaled workers to be built from an image name pattern as well as
an image id. This deals with long running clusters where the official
image is updated with security changes over time.

* brightbox: set default docker registry

Only set the registry value in the local Makefile if it is missing

* Update oci-ip-cluster-autoscaler-w-config.yaml

* Update oci-ip-cluster-autoscaler-w-principals.yaml

* address comments

* golint fix

* Remove print condition for vpa-beta2-crd.

* Improvement: Modified the VPA content for the helm chart.

* Bump the Chart version to 9.29.1 and CA image to 1.27.2

* make no-op binpacking limiter as default + move mark nodegroups to its method

* Drop projected volumes for init containers

* fix zonal gce outage breaking CA when only some of the zones are failed

* Bump version to 0.14.0 as a preparation for release.

* Update vendor to Kubernetes 1.28.0-alpha.2

* Interface fixes after Kubernetes 1.28.0-alpha.2 vendor update

* Execute git commands to show the state of local clone of the repo.

* Clarify and simplify the "build and stage images" step.

* Mention logs from kubernetes#5862 in release instructions.

* addressed comments

* chore: remove unused func scaleFromZeroAnnotationsEnabled

Signed-off-by: Dinesh B <[email protected]>

* add cluster-autoscaler name and version to the user agent

This makes it easier to distinguish between various users
of the Go SDK.

* Explicitly create and remove buildx builders

* Apply fixes to in place support VPA AEP

Looks like the first PR merged a bit too early, while there were open coments

* Add voelzmo to VPA reviewers

voelzmo meets
[requirements](https://github.com/kubernetes/community/blob/9504ce87ec14cff9455e794fdcbc5088c52f9dd9/community-membership.md#requirements-1):

- K8s org members since 2023-02: kubernetes/org#4015
- Reviewer of 12 merged VPA PRs: https://github.com/kubernetes/autoscaler/pulls?q=is%3Apr+reviewed-by%3Avoelzmo+label%3Avertical-pod-autoscaler+is%3Amerged+
- Sent 10 merged VPA PRs: https://github.com/kubernetes/autoscaler/pulls?q=is%3Apr+label%3Avertical-pod-autoscaler+author%3Avoelzmo+is%3Amerged

* Bump default VPA version to 0.14.0

* Minor tweaks after preparing VPA 0.14.0 release.

* fix: CA on fargate causing log flood
- happens when CA tries to check if the unmanaged fargate node is a part of ASG (it isn't)
- and keeps on logging error
Signed-off-by: vadasambar <[email protected]>

* test: fix node names
Signed-off-by: vadasambar <[email protected]>

* Sort nodegroups in order of their ID

* Move two util functions from actuator to delete_in_batch, where they are more appropriate

* Add support for atomic scale-down in node group options

* Extract cropNodesToBudgets function out of actuator file

* Support atomic scale-down option for node groups

* Respond to readability-related comments from the review

* Don't pass NodeGroup as a parameter to functions running asynchronously

* Add unit test for group_deletion_scheduler

* Use single AtomicScaling option for scale up and scale down

* address comments

* Address next set of comments

* update agnhost image to pull from registry.k8s.io

* Revert "Add subresource status for vpa"

This reverts commit 1384c8b.

* Bugfix for budget cropping

Previous "CropNodes" function of ScaleDownBudgetProcessor had an
assumption that atomically-scaled node groups should be classified as
"empty" or "drain" as a whole, however Cluster Autoscaler may classify
some of the nodes from a single group as "empty" and other as "drain".

* Remove unneeded node groups regardless of scale down being in cooldown.

* Update VPA vendor

Generated by runing:

```
go mod tidy
go mod vendor
```

* Replace `BuildTestContainer` with use of builder

* Quote temp folder name parameter to avoid errors

* Include short unregistered nodes in calculation of incorrect node group
sizes

* Add BigDarkClown to Cluster Autoscaler approvers

* Add support for scaling up ZeroToMaxNodesScaling node groups

* Use appropriate logging levels

* Remove unused field in expander and add comment about estimator

* Merge tests for ZeroToMaxNodesScaling into one table-driven test.

* Merged multiple tests into one single table driven test.
* Fixed some typos.

* Change handling of scale up options for ZeroToMaxNodeScaling in orchestrator

* Started handling scale up options for ZeroToMaxNodeScaling with the existing estimator
* Skip setting similar node groups for the node groups that use
  ZeroToMaxNodeScaling
* Renamed the autoscaling option from "AtomicScaleUp" to "AtomicScaling"
* Merged multiple tests into one single table driven test.
* Fixed some typos.

* Rename the autoscaling option

* Renamed the "AtomicScaling" autoscaling option to
  "ZeroOrMaxNodeScaling" to be more clear about the behavior.

* Record all vpa api versions in recommender metrics

Change the tracking of APIVersion from a boolean
indicating if the VPA is v1beta1
to the version string
and make sure it gets exported
in metrics.

Add tests for the recommender metrics.

* Add subresource status for vpa

Add status field in subresource on crd yaml and add new ClusterRole system:vpa-actor to patch /status subresource.
The `metadata.generation` only increase on vpa spec update.
Fix e2e test for patch and create vpa

* Implement threshold interface for use by threshold based limiter
Add EstimationContext to take into account runtime state of the autoscaling for estimations
Implement static threshold
Implement cluster capacity threshold for Estimation Limiter
Implement similar node groups capacity threshold for Estimation Limiter
Set default estimation thresholds

* Fix tests

* Add ClusterStateRegistry to the AutoscalingContext.

Due to the dependency of the MaxNodeProvisionTimeProvider on the context
the provider was extracted to a dedicated package and injected to the
ClusterStateRegistry after context creation.

* Make signature of GetDurationLimit uniformed with GetNodeLimit
For SNG threshold include capacity of the currently estimated node group (as it is not part of SNG itself)
Replaced direct calls with use of getters in cluster capacity threshold
Renamed getters removing the verb Get
Replace EstimationContext struct with interface
Add support for negative threshold value in estimation limiter

* Add support for negative binpacking duration limit in threshold based estimation limiter

* update RBAC to only use verbs that exist for the resources

Signed-off-by: Maximilian Rink <[email protected]>

* Move powerState to azure_util, change default to powerStateUnknown

* renames all PowerState* consts to vmPowerState*
* moves vmPowerState* consts and helper functions to azure_util.go
* changes default vmPowerState to vmPowerStateUnknown instead of vmPowerStateStopped when a power state is not set.

* test: fix failing tests
- remove non-relevant comment related to rescheduler
Signed-off-by: vadasambar <[email protected]>

* feat: set `IgnoreDaemonSetsUtilization` per nodegroup
Signed-off-by: vadasambar <[email protected]>

fix: test cases failing for actuator and scaledown/eligibility
- abstract default values into `config`
Signed-off-by: vadasambar <[email protected]>

refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code
- there is no change in the flag name
- rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <[email protected]>

refactor: reset help text for `ignore-daemonsets-utilization` flag
- because per nodegroup override is supported only for AWS ASG tags as of now
Signed-off-by: vadasambar <[email protected]>

docs: add info about overriding `--ignore-daemonsets-utilization` per ASG
- in AWS cloud provider README
Signed-off-by: vadasambar <[email protected]>

refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface
- to limit the functions that can be used
- since we need it only for `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <[email protected]>

fix: tests failing for actuator
- rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter`
- move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests
Signed-off-by: vadasambar <[email protected]>

fix: go lint errors for `MockNodeGroupConfigGetter`
Signed-off-by: vadasambar <[email protected]>

test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir
Signed-off-by: vadasambar <[email protected]>

test: update node group config processor tests for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <[email protected]>

test: update eligibility test cases for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <[email protected]>

test: run actuation tests for 2 NGS
- one with `IgnoreDaemonSetsUtilization`: `false`
- one with `IgnoreDaemonSetsUtilization`: `true`
Signed-off-by: vadasambar <[email protected]>

test: add tests for `IgnoreDaemonSetsUtilization` in actuator
- add helper to generate multiple ds pods dynamically
- get rid of mock config processor because it is not required
Signed-off-by: vadasambar <[email protected]>

test: fix failing tests for actuator
Signed-off-by: vadasambar <[email protected]>

refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option
- not required
Signed-off-by: vadasambar <[email protected]>

fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey`
Signed-off-by: vadasambar <[email protected]>

refactor: use `generateDsPods` instead of `generateDsPod`
Signed-off-by: vadasambar <[email protected]>

refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization`
Signed-off-by: vadasambar <[email protected]>

* test: fix merge conflicts in actuator tests
Signed-off-by: vadasambar <[email protected]>

* refactor: use `actuatorNodeGroupConfigGetter` param in `NewActuator`
- instead of passing all the processors (we only need `NodeGroupConfigProcessor`)
Signed-off-by: vadasambar <[email protected]>

* test: refactor eligibility tests
- add suffix to tests with `IgnoreDaemonSetsUtilization` set to `true` and `IgnoreDaemonSetsUtilization` set to `false`
Signed-off-by: vadasambar <[email protected]>

* refactor: remove comment line (not relevant anymore)
Signed-off-by: vadasambar <[email protected]>

* fix: dynamic assignment of the scale down threshold flags. Setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the larger of the two flags in the case both are set

* Refactor autoscaler.go and static_autoscalar.go to move declaration of the NodeDeletion option to main.go

* Fixed go:build tags for ovhcloud

* Update the go:build tag for missing cloud providers.

* Adapt FAQ for Pods without controller

* Use strings instead of NodeGroups as map keys in budgets.go

* Delete dead code from budgets.go

* Re-introduce asynchronous node deletion and clean node deletion logic.

* feat: support custom scheduler config for in-tree schedulr plugins (without extenders)
Signed-off-by: vadasambar <[email protected]>

refactor: rename `--scheduler-config` -> `--scheduler-config-file` to avoid confusion
Signed-off-by: vadasambar <[email protected]>

fix: `goto` causing infinite loop
- abstract out running extenders in a separate function
Signed-off-by: vadasambar <[email protected]>

refactor: remove code around extenders
- we decided not to use scheduler extenders for checking if a pod would fit on a node
Signed-off-by: vadasambar <[email protected]>

refactor: move scheduler config to a `utils/scheduler` package`
- use default config as a fallback
Signed-off-by: vadasambar <[email protected]>

test: fix static_autoscaler test
Signed-off-by: vadasambar <[email protected]>

refactor: `GetSchedulerConfiguration` fn
- remove falling back
- add mechanism to detect if the scheduler config file flag was set
-
Signed-off-by: vadasambar <[email protected]>

test: wip add tests for `GetSchedulerConfig`
- tests are failing now
Signed-off-by: vadasambar <[email protected]>

test: add tests for `GetSchedulerConfig`
- abstract error messages so that we can use them in the tests
- set api version explicitly (this is what upstream does as well)
Signed-off-by: vadasambar <[email protected]>

refactor: do a round of cleanup to make PR ready for review
- make import names consistent
Signed-off-by: vadasambar <[email protected]>

fix: use `pflag` to check if the `--scheduler-config-file` flag was set
Signed-off-by: vadasambar <[email protected]>

docs: add comments for exported error constants
Signed-off-by: vadasambar <[email protected]>

refactor: don't export error messages
- exporting is not needed
Signed-off-by: vadasambar <[email protected]>

fix: add underscore in test file name
Signed-off-by: vadasambar <[email protected]>

test: fix test failing because of no comment on exported `SchedulerConfigFileFlag`
Signed-off-by: vadasambar <[email protected]>

refacotr: change name of flag variable `schedulerConfig` -> `schedulerConfigFile`
- avoids confusion
Signed-off-by: vadasambar <[email protected]>

test: add extra test cases for predicate checker
- where the predicate checker uses custom scheduler config
Signed-off-by: vadasambar <[email protected]>

refactor: remove `setFlags` variable
- not needed anymore
Signed-off-by: vadasambar <[email protected]>

refactor: abstract custom scheduler configs into `conifg` package
- make them constants
Signed-off-by: vadasambar <[email protected]>

test: fix linting error
Signed-off-by: vadasambar <[email protected]>

refactor: introduce a new custom test predicate checker
- instead of adding a param to the current one
- this is so that we don't have to pass `nil` to the existing test predicate checker in many places
Signed-off-by: vadasambar <[email protected]>

refactor: rename `NewCustomPredicateChecker` -> `NewTestPredicateCheckerWithCustomConfig`
- latter narrows down meaning of the function better than former
Signed-off-by: vadasambar <[email protected]>

refactor: rename `GetSchedulerConfig` -> `ConfigFromPath`
- `scheduler.ConfigFromPath` is shorter and feels less vague than `scheduler.GetSchedulerConfig`
- move test config to a new package `test` under `config` package
Signed-off-by: vadasambar <[email protected]>

docs: add `TODO` for replacing code to parse scheduler config
- with upstream function
Signed-off-by: vadasambar <[email protected]>

* Use fixed version of golang image

* Fix TestBinpackingLimiter flake

* Bump golang from 1.20.5 to 1.20.6 in /vertical-pod-autoscaler/builder

Bumps golang from 1.20.5 to 1.20.6.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Fix: Do not inject fakeNode for instance which has errors on create

* chore: add script to update vendored hcloud-go

* chore(deps): update vendored hcloud-go to 2.0.0

Generated by:

```
UPSTREAM_REF=v2.0.0 hack/update-vendor.sh
```

* fix: balancer RBAC permission to update balancer status

* CA - AWS Cloudprovider OWNERS Update

* Enable parallel drain by default.

* Add BigDarkClown to patch releases schedule

* Update Cluster Autoscaler vendor to K8s 1.28.0-beta.0

* Add EstimationAnalyserFunc to be run at the end of the estimation logic

* Remove ChangeRequirements with `OrEqual`

* Add EvictionRequirements to types

* Run `generate-crd-yaml.sh`

* Add metrics for improved observability:
* pending_node_deletions
* failed_gpu_scale_ups_total

* Add requirement for Custom Resources to VPA FAQ

* Clarify Eviction Control for Pods with multiple Containers

* Fix broken hyperlink

Co-authored-by: Shubham <[email protected]>

* Update vertical-pod-autoscaler/FAQ.md

Co-authored-by: Joachim <[email protected]>

* Update vertical-pod-autoscaler/FAQ.md

Co-authored-by: Joachim <[email protected]>

* Reword AND/OR combinations for more clarity

* Fix nil pointer exception for case when node is nil while processing gpuInfo

* feat: add prometheus basic auth

Signed-off-by: AhmedGrati <[email protected]>

* Add error code for invalid reservations to GCE client

* Bump golang from 1.20.6 to 1.20.7 in /vertical-pod-autoscaler/builder

Bumps golang from 1.20.6 to 1.20.7.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Support ZeroOrMaxNodeScaling node groups when cleaning up unregistered nodes

* Don't pass nil nodes to GetGpuInfoForMetrics

* Revert "Fix nil pointer exception for case when node is nil while processing …"

* Clean up NodeGroupConfigProcessor interface

* docs: add kep to add fswatcher to nanny for automatic nanny configuration

Signed-off-by: AhmedGrati <[email protected]>

* Allow using an external secret instead of using the one the Helm chart creates

* Remove the MaxNodeProvisioningTimeProvider interface

* Fixed the hyperlink for Node group auto discovery.

* Update ResourcePolicy description and limit control README

* s390x image support

* Bump golang from 1.20.7 to 1.21.0 in /vertical-pod-autoscaler/builder

Bumps golang from 1.20.7 to 1.21.0.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* test

* Set batch size to target size for atomically scaled groups

* a little extra validation

* test with 2 atomic groups

* don't block draining other groups when one group has some empty nodes

* fix: Broken links to testgrid dashboard

* fix: scale down broken for providers not implementing NodeGroup.GetOptions()

* feat(hetzner): use less requests while waiting for server create

The default is to send a new request every 500ms, this will instead use
an exponential backoff while waiting for the server the create.

* Update in-place updates AEP adding details to consider

* Fix Doc with External gRPC

Signed-off-by: ZhengSheng0524 <[email protected]>

* Add fetch reservations in specific project

GCE supports shared reservations where the reservation is in a different
project than the project the cluster is in. Add GCE client method to get
said reservations so autoscaling can support shared reservations.

* kep: add config file format and structure notes

Signed-off-by: AhmedGrati <[email protected]>

* CA - 1.28.0 k/k Vendor Update

* Fix duplicate imports in IT

* re-add changes part of FORK-CHANGE

* Re added a fork change command and updated sync change notes

* Update cluster-autoscaler/SYNC-CHANGES/SYNC_CHANGES-1.28.md

Co-authored-by: Rishabh Patel <[email protected]>

---------

Signed-off-by: Eng Zer Jun <[email protected]>
Signed-off-by: vadasambar <[email protected]>
Signed-off-by: Vishal Anarse <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Dinesh B <[email protected]>
Signed-off-by: Maximilian Rink <[email protected]>
Signed-off-by: vadasambar <[email protected]>
Signed-off-by: AhmedGrati <[email protected]>
Signed-off-by: ZhengSheng0524 <[email protected]>
Co-authored-by: qianlei.qianl <[email protected]>
Co-authored-by: shubham82 <[email protected]>
Co-authored-by: Guy Templeton <[email protected]>
Co-authored-by: Kubernetes Prow Robot <[email protected]>
Co-authored-by: Eng Zer Jun <[email protected]>
Co-authored-by: Bartłomiej Wróblewski <[email protected]>
Co-authored-by: Kushagra <[email protected]>
Co-authored-by: kei-gnu <[email protected]>
Co-authored-by: michael mccune <[email protected]>
Co-authored-by: vadasambar <[email protected]>
Co-authored-by: Vishal Anarse <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Neil Wilson <[email protected]>
Co-authored-by: Sourabh Gupta <[email protected]>
Co-authored-by: Artur Żyliński <[email protected]>
Co-authored-by: Damika Gamlath <[email protected]>
Co-authored-by: Karol Golab <[email protected]>
Co-authored-by: Dinesh B <[email protected]>
Co-authored-by: Todd Neal <[email protected]>
Co-authored-by: Marco Voelz <[email protected]>
Co-authored-by: Joachim Bartosik <[email protected]>
Co-authored-by: vadasambar <[email protected]>
Co-authored-by: Karol Wychowaniec <[email protected]>
Co-authored-by: Kevin Wiesmueller <[email protected]>
Co-authored-by: Aleksandra Gacek <[email protected]>
Co-authored-by: Hakan Bostan <[email protected]>
Co-authored-by: xiaoqing <[email protected]>
Co-authored-by: Yuriy Stryuchkov <[email protected]>
Co-authored-by: Daniel Gutowski <[email protected]>
Co-authored-by: Maximilian Rink <[email protected]>
Co-authored-by: dom.bozzuto <[email protected]>
Co-authored-by: bsoghigian <[email protected]>
Co-authored-by: Krzysztof Siedlecki <[email protected]>
Co-authored-by: Julian Tölle <[email protected]>
Co-authored-by: Amir Alavi <[email protected]>
Co-authored-by: Daniel Kłobuszewski <[email protected]>
Co-authored-by: droctothorpe <[email protected]>
Co-authored-by: Marco Voelz <[email protected]>
Co-authored-by: Jayant Jain <[email protected]>
Co-authored-by: AhmedGrati <[email protected]>
Co-authored-by: Mike Tougeron <[email protected]>
Co-authored-by: Sachin Tiptur <[email protected]>
Co-authored-by: Saripalli Lavanya <[email protected]>
Co-authored-by: Aleksandra Malinowska <[email protected]>
Co-authored-by: Yash Khare <[email protected]>
Co-authored-by: Piotr Betkier <[email protected]>
Co-authored-by: ZhengSheng0524 <[email protected]>
Co-authored-by: Jessica Chen <[email protected]>
Co-authored-by: Rishabh Patel <[email protected]>
  • Loading branch information
Show file tree
Hide file tree
Showing 1,842 changed files with 134,451 additions and 47,687 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# KEP-5546: Scaling based on container count

<!-- toc -->
- [Summary](#summary)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Notes](#notes)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
<!-- /toc -->

## Summary

Currently Addon Resizer supports scaling based on the number of nodes. Some workloads use resources proportionally to
the number of containers in the cluster. Since number of containers per node is very different in different clusters
it's more resource-efficient to scale such workloads based directly on the container count.

### Goals

- Allow scaling workloads based on count of containers in a cluster.
- Allow this for Addon Resizer 1.8 ([used by metrics server]).

### Non-Goals

- Using both node and container count to scale workloads.
- Bringing this change to the `master` branch of Addon Resizer.

## Proposal

Add flag `--scaling-mode` to Addon Resizer on the [`addon-resizer-release-1.8`] branch. Flag will
have two valid values:

- `node-proportional` - default, current behavior.
- `container-proportional` - addon resizer will set resources, using the same algorithm it's using now but using number
of containers where it's currently using number of nodes.

### Notes

Addon Resizer 1.8 assumes in multiple places that it's scaling based on the number of nodes:

- [Flag descriptions] that directly reference node counts (`--extra-cpu`, `--extra-memory`, `--extra-storage`, and
`--minClusterSize`) will need to be updated to instead refer to cluster size.
- [README] will need to be updated to reference cluster size instead of node count and explain that cluster size refers
to either node count or container count, depending on the value of the `--scaling-mode` flag.
- Many variable names in code which now refer to node count will refer to cluster size and should be renamed accordingly.

In addition to implementing the feature we should also clean up the code and documentation.

### Risks and Mitigations

One potential risk is that Addon resizer can obtain cluster size (node count or container count):
- from metrics or
- by querying Cluster Api Server to list all objects of the appropriate type

depending on the configuration. There can be many times more containers in a cluster that there are nodes. So listing
all containers could result in higher load on the Cluster API server. Since Addon Resizer is requesting very few fields
I don't expect this effect to be noticeable.

Also I expect metrics-server to test for this before using the feature and any other users of Addon Resizer are likely
better off using metrics (which don't have this problem).

## Design Details

- Implement function `kubernetesClient.CountContainers()`. It will be analogous to the existing
[`kubernetesClient.CountNodes()`] function.
- If using metrics to determine number of containers in the cluster:
- Fetch pod metrics (similar to [fetching node metrics] but use `/pods` URI instead of `/nodes`).
- For each pod obtain number of containers (length of the `containers` field).
- Sum container counts for all pods.
- If using API server:
- Fetch list pods (similar to [listing nodes])
- Fetch only [`Spec.InitContainers`], [`Spec.Containers`], and [`Spec.EphemeralContainers`] fields.
- Exclude pods in terminal states ([selector excluding pods in terminal states in VPA])
- Sum container count over pods.
- Add the `--scaling-mode` flag, with two valid values:
- `node-proportional` - default, current behavior, scaling based on clusters node count and
- `container-proportional` - new behavior, scaling based on clusters container count
- Pass value indicating if we should use node count or container count to the [`updateResources()`] function.
- In `updateResources()` use node count or container count, depending on the value.

Check that listing containers directly works

Coinsider listing pods, getting containers only for working pods

### Test Plan

In addition to unit tests we will run manual e2e test:

- Create config based on [`example.yaml`] but scaling the deployment based on the number of containers in the cluster.
- Create config starting deployment with 100 `pause` containers.

Test the feature by:

- Starting the deployment scaled by Addon Resizer, based on node count.
- Observe size of the deployment and that it's stable.
- Start deployment with 100 `pause` containers.
- Observe the scaled deployment change resources appropriately.

Test the node-based scaling:

- Apply [`example.yaml`].
- Observe amount and stability assigned resources.
- Resize cluster.
- Observe change in assigned resources.

Both tests should be performed with metrics- and API- based scaling.

[used by metrics server]: https://github.com/kubernetes-sigs/metrics-server/blob/0c47555e9b49cfe0719db1a0b7fb6c8dcdff3d38/charts/metrics-server/values.yaml#L121
[`addon-resizer-release-1.8`]: https://github.com/kubernetes/autoscaler/tree/addon-resizer-release-1.8
[Flag descriptions]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/main/pod_nanny.go#L47
[README]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/README.md?plain=1#L1
[`kubernetesClient.CountNodes()`]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L58
[fetching node metrics]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L150
[listing nodes]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L71
[`Spec.InitContainers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3143
[`Spec.Containers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3150
[`Spec.EphemeralContainers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3158
[`Status.Phase`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L4011
[selector excluding pods in terminal states in VPA]: https://github.com/kubernetes/autoscaler/blob/04e5bfc88363b4af9fdeb9dfd06c362ec5831f51/vertical-pod-autoscaler/e2e/v1beta2/common.go#L195
[`updateResources()`]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/nanny_lib.go#L126
[`example.yaml`]: https://github.com/kubernetes/autoscaler/blob/c8d612725c4f186d5de205ed0114f21540a8ed39/addon-resizer/deploy/example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# KEP-5546: Automatic reload of nanny configuration when updated

<!-- toc -->
- [Summary](#summary)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Notes](#notes)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
<!-- /toc -->

Sure, here's the enhancement proposal in the requested format:

## Summary
- **Goals:** The goal of this enhancement is to improve the user experience for applying nanny configuration changes in the addon-resizer 1.8 when used with the metrics server. The proposed solution involves automatically reloading the nanny configuration whenever changes occur, eliminating the need for manual intervention and sidecar containers.
- **Non-Goals:** This proposal does not aim to update the functional behavior of the addon-resizer.

## Proposal
The proposed solution involves updating the addon-resizer with the following steps:
- Create a file system watcher using `fsnotify` under `utils/fswatcher` to watch nanny configurations' changes. It should run as a goroutine in the background.
- Detect changes of the nanny configurations' file using the created `fswatcher` trigger the reloading process when configuration changes are detected. Events should be sent in a channel.
- Re-execute the method responsible for building the NannyConfiguration `loadNannyConfiguration` to apply the updated configuration to the addon-resizer.
- Proper error handling should be implemented to manage scenarios where the configuration file is temporarily inaccessible or if there are parsing errors in the configuration file.

### Risks and Mitigations
- There is a potential risk of filesystem-related issues causing the file watcher to malfunction. Proper testing and error handling should be implemented to handle such scenarios gracefully.
- Errors in the configuration file could lead to unexpected behavior or crashes. The addon-resizer should handle parsing errors and fall back to the previous working configuration if necessary.

## Design Details
- Create a new package for the `fswatcher` under `utils/fswatcher`. It would contain the `fswatcher` struct and methods and unit-tests.
- `FsWatcher` struct would look similar to this:
```go
type FsWatcher struct {
*fsnotify.Watcher

Events chan struct{}
ratelimit time.Duration
names []string
paths map[string]struct{}
}
```
- Implement the following functions:
- `CreateFsWatcher`: Instantiates a new `FsWatcher` and start watching on file system.
- `initWatcher`: Initializes the `fsnotify` watcher and initialize the `paths` that would be watched.
- `add`: Adds a new file to watch.
- `reset`: Re-initializes the `FsWatcher`.
- `watch`: watches for the configured files.
- In the main function, we create a new `FsWatcher` and then we wait in an infinite loop to receive events indicating
filesystem changes. Based on these changes, we re-execute `loadNannyConfiguration` function.

> **Note:** The expected configuration file format is YAML. It has the same structure as the NannyConfiguration CRD.

### Test Plan
To ensure the proper functioning of the enhanced addon-resizer, the following test plan should be executed:
1. **Unit Tests:** Write unit tests to validate the file watcher's functionality and ensure it triggers events when the configuration file changes.
2. **Manual e2e Tests:** Deploy the addon-resizer with `BaseMemory` of `300Mi` and then we change the `BaseMemory` to `100Mi`. We should observer changes in the behavior of watched pod.
[fsnotify]: https://github.com/fsnotify/fsnotify
6 changes: 6 additions & 0 deletions balancer/deploy/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ rules:
- watch
- patch
- update
- apiGroups:
- balancer.x-k8s.io
resources:
- balancers/status
verbs:
- update
- apiGroups:
- ""
resources:
Expand Down
4 changes: 2 additions & 2 deletions balancer/proposals/balancer.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ These domains may include:
* Cloud provider zones inside a single region, to ensure that the application is still up and running, even if one of the zones has issues.
* Different types of Kubernetes nodes. These may involve nodes that are spot/preemptible, or of different machine families.

A single Kuberentes deployment may either leave the placement entirely up to the scheduler
A single Kubernetes deployment may either leave the placement entirely up to the scheduler
(most likely leading to something not entirely desired, like all pods going to a single domain) or
focus on a single domain (thus not achieving the goal of being in two or more domains).

Expand Down Expand Up @@ -179,4 +179,4 @@ type BalancerStatus struct {
// +patchStrategy=merge
Conditions []metav1.Condition
}
```
```
2 changes: 1 addition & 1 deletion builder/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

FROM golang:1.20
FROM golang:1.20.4
LABEL maintainer="Marcin Wielgus <[email protected]>"

ENV GOPATH /gopath/
Expand Down
4 changes: 2 additions & 2 deletions charts/cluster-autoscaler/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
apiVersion: v2
appVersion: 1.26.2
appVersion: 1.27.2
description: Scales Kubernetes worker nodes within autoscaling groups.
engine: gotpl
home: https://github.com/kubernetes/autoscaler
Expand All @@ -11,4 +11,4 @@ name: cluster-autoscaler
sources:
- https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
type: application
version: 9.28.0
version: 9.29.2
8 changes: 6 additions & 2 deletions charts/cluster-autoscaler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,10 @@ Though enough for the majority of installations, the default PodSecurityPolicy _
### VerticalPodAutoscaler
The chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) for the Deployment if needed. A VPA can help minimize wasted resources when usage spikes periodically or remediate containers that are being OOMKilled.
The CA Helm Chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) object from Chart version `9.27.0`
onwards for the Cluster Autoscaler Deployment to scale the CA as appropriate, but for that, we
need to install the VPA to the cluster separately. A VPA can help minimize wasted resources
when usage spikes periodically or remediate containers that are being OOMKilled.
The following example snippet can be used to install VPA that allows scaling down from the default recommendations of the deployment template:
Expand Down Expand Up @@ -383,7 +386,7 @@ vpa:
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy |
| image.pullSecrets | list | `[]` | Image pull secrets |
| image.repository | string | `"registry.k8s.io/autoscaling/cluster-autoscaler"` | Image repository |
| image.tag | string | `"v1.26.2"` | Image tag |
| image.tag | string | `"v1.27.2"` | Image tag |
| kubeTargetVersionOverride | string | `""` | Allow overriding the `.Capabilities.KubeVersion.GitVersion` check. Useful for `helm template` commands. |
| magnumCABundlePath | string | `"/etc/kubernetes/ca-bundle.crt"` | Path to the host's CA bundle, from `ca-file` in the cloud-config file. |
| magnumClusterName | string | `""` | Cluster name or ID in Magnum. Required if `cloudProvider=magnum` and not setting `autoDiscovery.clusterName`. |
Expand All @@ -408,6 +411,7 @@ vpa:
| rbac.serviceAccount.name | string | `""` | The name of the ServiceAccount to use. If not set and create is `true`, a name is generated using the fullname template. |
| replicaCount | int | `1` | Desired number of pods |
| resources | object | `{}` | Pod resource requests and limits. |
| secretKeyRefNameOverride | string | `""` | Overrides the name of the Secret to use when loading the secretKeyRef for AWS and Azure env variables |
| securityContext | object | `{}` | [Security context for pod](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) |
| service.annotations | object | `{}` | Annotations to add to service |
| service.create | bool | `true` | If `true`, a Service will be created. |
Expand Down
5 changes: 4 additions & 1 deletion charts/cluster-autoscaler/README.md.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,10 @@ Though enough for the majority of installations, the default PodSecurityPolicy _

### VerticalPodAutoscaler

The chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) for the Deployment if needed. A VPA can help minimize wasted resources when usage spikes periodically or remediate containers that are being OOMKilled.
The CA Helm Chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) object from Chart version `9.27.0`
onwards for the Cluster Autoscaler Deployment to scale the CA as appropriate, but for that, we
need to install the VPA to the cluster separately. A VPA can help minimize wasted resources
when usage spikes periodically or remediate containers that are being OOMKilled.

The following example snippet can be used to install VPA that allows scaling down from the default recommendations of the deployment template:

Expand Down
11 changes: 10 additions & 1 deletion charts/cluster-autoscaler/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -151,13 +151,22 @@ rules:
- cluster.x-k8s.io
resources:
- machinedeployments
- machinedeployments/scale
- machinepools
- machines
- machinesets
verbs:
- get
- list
- update
- watch
- apiGroups:
- cluster.x-k8s.io
resources:
- machinedeployments/scale
- machinepools/scale
verbs:
- get
- patch
- update
{{- end }}
{{- end -}}
20 changes: 10 additions & 10 deletions charts/cluster-autoscaler/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -132,36 +132,36 @@ spec:
valueFrom:
secretKeyRef:
key: AwsAccessKeyId
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- end }}
{{- if .Values.awsSecretAccessKey }}
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AwsSecretAccessKey
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- end }}
{{- else if eq .Values.cloudProvider "azure" }}
- name: ARM_SUBSCRIPTION_ID
valueFrom:
secretKeyRef:
key: SubscriptionID
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_RESOURCE_GROUP
valueFrom:
secretKeyRef:
key: ResourceGroup
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_VM_TYPE
valueFrom:
secretKeyRef:
key: VMType
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: AZURE_CLUSTER_NAME
valueFrom:
secretKeyRef:
key: ClusterName
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- if .Values.azureUseWorkloadIdentityExtension }}
- name: ARM_USE_WORKLOAD_IDENTITY_EXTENSION
value: "true"
Expand All @@ -173,22 +173,22 @@ spec:
valueFrom:
secretKeyRef:
key: TenantID
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_CLIENT_ID
valueFrom:
secretKeyRef:
key: ClientID
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_CLIENT_SECRET
valueFrom:
secretKeyRef:
key: ClientSecret
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: AZURE_NODE_RESOURCE_GROUP
valueFrom:
secretKeyRef:
key: NodeResourceGroup
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- end }}
{{- end }}
{{- range $key, $value := .Values.extraEnv }}
Expand Down
Loading

0 comments on commit 0ebbdfb

Please sign in to comment.