Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: cherry-pick scale-down-delay-* per nodegroup to 1.29 #6484

Conversation

vadasambar
Copy link
Member

What type of PR is this?

/kind cherry-pick

What this PR does / why we need it:

cherry-picks #5729 to 1.29

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added: `--scale-down-delay-type-local` flag. It specifies if `--scale-down-delay-after-*` flags should be applied locally per nodegroup or globally across all nodegroups)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Signed-off-by: vadasambar <[email protected]>

feat: update scale down status after every scale up
- move scaledown delay status to cluster state/registry
- enable scale down if  `ScaleDownDelayTypeLocal` is enabled
- add new funcs on cluster state to get and update scale down delay status
- use timestamp instead of booleans to track scale down delay status
Signed-off-by: vadasambar <[email protected]>

refactor: use existing fields on clusterstate
- uses `scaleUpRequests`, `scaleDownRequests` and `scaleUpFailures` instead of `ScaleUpDelayStatus`
- changed the above existing fields a little to make them more convenient for use
- moved initializing scale down delay processor to static autoscaler (because clusterstate is not available in main.go)
Signed-off-by: vadasambar <[email protected]>

refactor: remove note saying only `scale-down-after-add` is supported
- because we are supporting all the flags
Signed-off-by: vadasambar <[email protected]>

fix: evaluate `scaleDownInCooldown` the old way only if `ScaleDownDelayTypeLocal` is set to `false`
Signed-off-by: vadasambar <[email protected]>

refactor: remove line saying `--scale-down-delay-type-local` is only supported for `--scale-down-delay-after-add`
- because it is not true anymore
- we are supporting all `--scale-down-delay-after-*` flags per nodegroup
Signed-off-by: vadasambar <[email protected]>

test: fix clusterstate tests failing
Signed-off-by: vadasambar <[email protected]>

refactor: move back initializing processors logic to from static autoscaler to main
- we don't want to initialize processors in static autoscaler because anyone implementing an alternative to static_autoscaler has to initialize the processors
- and initializing specific processors is making static autoscaler aware of an implementation detail which might not be the best practice
Signed-off-by: vadasambar <[email protected]>

refactor: revert changes related to `clusterstate`
- since I am going with observer pattern
Signed-off-by: vadasambar <[email protected]>

feat: add observer interface for state of scaling
- to implement observer pattern for tracking state of scale up/downs (as opposed to using clusterstate to do the same)
- refactor `ScaleDownCandidatesDelayProcessor` to use fields from the new observer
Signed-off-by: vadasambar <[email protected]>

refactor: remove params passed to `clearScaleUpFailures`
- not needed anymore
Signed-off-by: vadasambar <[email protected]>

refactor: revert clusterstate tests
- approach has changed
- I am not making any changes in clusterstate now
Signed-off-by: vadasambar <[email protected]>

refactor: add accidentally deleted lines for clusterstate test
Signed-off-by: vadasambar <[email protected]>

feat: implement `Add` fn for scale state observer
- to easily add new observers
- re-word comments
- remove redundant params from `NewDefaultScaleDownCandidatesProcessor`
Signed-off-by: vadasambar <[email protected]>

fix: CI complaining because no comments on fn definitions
Signed-off-by: vadasambar <[email protected]>

feat: initialize parent `ScaleDownCandidatesProcessor`
- instead  of `ScaleDownCandidatesSortingProcessor` and `ScaleDownCandidatesDelayProcessor` separately
Signed-off-by: vadasambar <[email protected]>

refactor: add scale state notifier to list of default processors
- initialize processors for `NewDefaultScaleDownCandidatesProcessor` outside and pass them to the fn
- this allows more flexibility
Signed-off-by: vadasambar <[email protected]>

refactor: add observer interface
- create a separate observer directory
- implement `RegisterScaleUp` function in the clusterstate
- TODO: resolve syntax errors
Signed-off-by: vadasambar <[email protected]>

feat: use `scaleStateNotifier` in place of `clusterstate`
- delete leftover `scale_stateA_observer.go` (new one is already present in `observers` directory)
- register `clustertstate` with `scaleStateNotifier`
- use `Register` instead of `Add` function in `scaleStateNotifier`
- fix `go build`
- wip: fixing tests
Signed-off-by: vadasambar <[email protected]>

test: fix syntax errors
- add utils package `pointers` for converting `time` to pointer (without having to initialize a new variable)
Signed-off-by: vadasambar <[email protected]>

feat: wip track scale down failures along with scale up failures
- I was tracking scale up failures but not scale down failures
- fix copyright year 2017 -> 2023 for the new `pointers` package
Signed-off-by: vadasambar <[email protected]>

feat: register failed scale down with scale state notifier
- wip writing tests for `scale_down_candidates_delay_processor`
- fix CI lint errors
- remove test file for `scale_down_candidates_processor` (there is not much to test as of now)
Signed-off-by: vadasambar <[email protected]>

test: wip tests for `ScaleDownCandidatesDelayProcessor`
Signed-off-by: vadasambar <[email protected]>

test: add unit tests for `ScaleDownCandidatesDelayProcessor`
Signed-off-by: vadasambar <[email protected]>

refactor: don't track scale up failures in `ScaleDownCandidatesDelayProcessor`
- not needed
Signed-off-by: vadasambar <[email protected]>

test: better doc comments for `TestGetScaleDownCandidates`
Signed-off-by: vadasambar <[email protected]>

refactor: don't ignore error in `NGChangeObserver`
- return it instead and let the caller decide what to do with it
Signed-off-by: vadasambar <[email protected]>

refactor: change pointers to values in `NGChangeObserver` interface
- easier to work with
- remove `expectedAddTime` param from `RegisterScaleUp` (not needed for now)
- add tests for clusterstate's `RegisterScaleUp`
Signed-off-by: vadasambar <[email protected]>

refactor: conditions in `GetScaleDownCandidates`
- set scale down in cool down if the number of scale down candidates is 0
Signed-off-by: vadasambar <[email protected]>

test: use `ng1` instead of `ng2` in existing test
Signed-off-by: vadasambar <[email protected]>

feat: wip static autoscaler tests
Signed-off-by: vadasambar <[email protected]>

refactor: assign directly instead of using `sdProcessor` variable
- variable is not needed
Signed-off-by: vadasambar <[email protected]>

test: first working test for static autoscaler
Signed-off-by: vadasambar <[email protected]>

test: continue working on static autoscaler tests
Signed-off-by: vadasambar <[email protected]>

test: wip second static autoscaler test
Signed-off-by: vadasambar <[email protected]>

refactor: remove `Println` used for debugging
Signed-off-by: vadasambar <[email protected]>

test: add static_autoscaler tests for scale down delay per nodegroup flags
Signed-off-by: vadasambar <[email protected]>

chore: rebase off the latest `master`
- change scale state observer interface's `RegisterFailedScaleup` to reflect latest changes around clusterstate's `RegisterFailedScaleup` in `master`
Signed-off-by: vadasambar <[email protected]>

test: fix clusterstate test failing
Signed-off-by: vadasambar <[email protected]>

test: fix failing orchestrator test
Signed-off-by: vadasambar <[email protected]>

refactor: rename `defaultScaleDownCandidatesProcessor` -> `combinedScaleDownCandidatesProcessor`
- describes the processor better
Signed-off-by: vadasambar <[email protected]>

refactor: replace `NGChangeObserver` -> `NodeGroupChangeObserver`
- makes it easier to understand for someone not familiar with the codebase
Signed-off-by: vadasambar <[email protected]>

docs: reword code comment `after` -> `for which`
Signed-off-by: vadasambar <[email protected]>

refactor: don't return error from `RegisterScaleDown`
- not needed as of now (no implementer function returns a non-nil error for this function)
Signed-off-by: vadasambar <[email protected]>

refactor: address review comments around ng change observer interface
- change dir structure of nodegroup change observer package
- stop returning errors wherever it is not needed in the nodegroup change observer interface
- rename `NGChangeObserver` -> `NodeGroupChangeObserver` interface (makes it easier to understand)
Signed-off-by: vadasambar <[email protected]>

refactor: make nodegroupchange observer thread-safe
Signed-off-by: vadasambar <[email protected]>

docs: add TODO to consider using multiple mutexes in nodegroupchange observer
Signed-off-by: vadasambar <[email protected]>

refactor: use `time.Now()` directly instead of assigning a variable to it
Signed-off-by: vadasambar <[email protected]>

refactor: share code for checking if there was a recent scale-up/down/failure
Signed-off-by: vadasambar <[email protected]>

test: convert `ScaleDownCandidatesDelayProcessor` into table tests
Signed-off-by: vadasambar <[email protected]>

refactor: change scale state notifier's `Register()` -> `RegisterForNotifications()`
- makes it easier to understand what the function does
Signed-off-by: vadasambar <[email protected]>

test: replace scale state notifier `Register` -> `RegisterForNotifications` in test
- to fix syntax errors since it is already renamed in the actual code
Signed-off-by: vadasambar <[email protected]>

refactor: remove `clusterStateRegistry` from `delete_in_batch` tests
- not needed anymore since we have `scaleStateNotifier`
Signed-off-by: vadasambar <[email protected]>

refactor: address PR review comments
Signed-off-by: vadasambar <[email protected]>

fix: add empty `RegisterFailedScaleDown` for clusterstate
- fix syntax error in static autoscaler test
Signed-off-by: vadasambar <[email protected]>
(cherry picked from commit 5de49a1)
@k8s-ci-robot
Copy link
Contributor

@vadasambar: The label(s) kind/cherry-pick cannot be applied, because the repository doesn't have them.

In response to this:

What type of PR is this?

/kind cherry-pick

What this PR does / why we need it:

cherry-picks #5729 to 1.29

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added: `--scale-down-delay-type-local` flag. It specifies if `--scale-down-delay-after-*` flags should be applied locally per nodegroup or globally across all nodegroups)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 30, 2024
Copy link

linux-foundation-easycla bot commented Jan 30, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: vadasambar / name: Suraj Banakar(बानकर) | スラジ (e05d34e)

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 30, 2024
@vadasambar vadasambar changed the base branch from master to cluster-autoscaler-release-1.29 January 30, 2024 18:01
@k8s-ci-robot k8s-ci-robot requested review from feiskyer and x13n January 30, 2024 18:01
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 30, 2024
@vadasambar vadasambar marked this pull request as ready for review January 30, 2024 18:10
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 30, 2024
@Shubham82
Copy link
Contributor

Thanks @vadasambar
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2024
@vadasambar
Copy link
Member Author

/assign @MaciekPytel

because Maciek is the maintainer doing releases this month

image
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#schedule

Copy link
Contributor

@mwielgus mwielgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mwielgus, vadasambar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 18, 2024
@k8s-ci-robot k8s-ci-robot merged commit aec9e1e into kubernetes:cluster-autoscaler-release-1.29 Mar 18, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants