Put ScaleDown logic behind an interface #4806

x13n · 2022-04-11T08:56:40Z

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This refactoring is a prerequisite for implementing parallel scale down, see #4766

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

This is a huge change and can be hard to review. I divided it into 10 commits that are self-contained (i.e. tests were passing on each commit). Looking at the changes may be simpler one commit at a time.

Does this PR introduce a user-facing change?

ScaleDown logic was significantly refactored. This should have no visible impact on the behavior of Cluster Autoscaler.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- Design doc: https://github.com/kubernetes/autoscaler/blob/2284649f5aa3ba9813779cf390c49100e9c1a9be/cluster-autoscaler/proposals/parallel_drain.md

/assign @MaciekPytel
/cc @jayantjain93

MaciekPytel · 2022-04-13T11:34:20Z

cluster-autoscaler/core/scaledown/actuation/softtaint.go

+limitations under the License.
+*/
+
+package actuation


I wonder if it wouldn't be better to move this logic to utils/deletetaint or elsewhere outside the core/? It's pretty much impossible to import core/ from outside of it. I can't really think of a reason for reusing DeletionCandidate soft taints, but a more generic rate-limiting for tainting seems like a good feature to have (especially if we can rate-limit the total tainting QPS, ie. soft taint less if we're doing a lot of hard tainting).

I'm not really sure if any of the above is worth the effort and I don't think it's required from this PR, just throwing around ideas to consider.

It is a separate package from core (even though it is a subdir), so I think it should be easier to reuse or make improvements now. The tests still depend on core test utils though, which depends on deletetaint, so it cannot be merged there - it would introduce a cyclic dependency. I'd leave it here for now.

Fair enough.

MaciekPytel · 2022-04-13T12:53:44Z

cluster-autoscaler/core/scaledown/actuation/softtaint.go

+}
+
+func (b *budgetTracker) processWithinBudget(f func()) {
+	klog.Infof("now = %v, startTime = %v, timeBudget = %v", now(), b.startTime, b.timeBudget)


This seems like a leftover debug log? If you want to keep it please use V(). Also, consider re-wording as it is will be very unclear when this log is coming from.

It was indeed a debug log, removed.

MaciekPytel · 2022-04-13T12:57:08Z

cluster-autoscaler/core/scaledown/actuation/softtaint.go

+
+func (b *budgetTracker) reportExceededLimits() {
+	if b.skippedNodes > 0 {
+		klog.V(4).Infof("Skipped adding/removing soft taints on %v nodes - API call limit exceeded", b.skippedNodes)


nit: this may happen as a result of either API call limit or time limit being exceeded, not sure if it's worth trying to clarify in the log line though.

That was carried over from existing code, but you're right - it was incorrect. Clarified now.

MaciekPytel · 2022-04-13T13:36:40Z

cluster-autoscaler/core/static_autoscaler.go

@@ -74,6 +75,7 @@ type StaticAutoscaler struct {
 	processorCallbacks      *staticAutoscalerProcessorCallbacks
 	initialized             bool
 	ignoredTaints           taints.TaintKeySet
+	utilizationTracker      *utilization.Tracker


If we're making utilizationTracker into an interface and exposing it outside of scale-down, maybe a better place to do it would be to keep it in context? This would give access to it across the codebase.

Do we need access across the codebase? I put this here so that scaleDownStatus can be updated without accessing private fields from ScaleDown. An alternative would be to make a public method in ScaleDown to access the map, but then we'd need to make that method a part of the interface. Perhaps that'd be actually cleaner than sharing the object under the hood.

Discussed offline with @x13n. I think we agree that it would be beneficial, for example, in order to produce utilization metrics. It's not in the scope of this PR though and can be done as a separate change later on.

MaciekPytel · 2022-04-13T15:46:44Z

cluster-autoscaler/simulator/utilization/tracker.go

+package utilization
+
+// Tracker allows storing and fetching utilization info for a set of nodes
+type Tracker struct {


If we're extracting utilization data from scale-down to the tracker, maybe we should also make tracker the owner of the calculation? I don't really see any advantage of this implementation over just using type Tracker map[string]Info and the core scale-down logic is still responsible for building an actual map like this.

The entire logic of checkNodeUtilization is clearly not a fit for this, but having the Tracker own CalculateUtilization would actually encapsulate some parts of implementation (ex. looking at context.IgnoreDaemonSetsUtilization). We could achieve this with minimal code change to scale-down if the Tracker would calculate the utilization for each node when it's first requested (ie. lazy calculation + cache). Or we could have it pre-calculate it for all nodes - I know we don't calculate utilization for all nodes today, but it's cheap logic and I see no strong reason against it (but also no particular benefit over lazy calculation).

The reason why I wrapped the map into a struct is that the ownership moved from ScaleDown to StaticAutoscaler and the ScaleDown replaces the whole map at times, which would make the map kept by StaticAutoscaler obsolete. Now that I think about it, I may have overengineered this a bit. Restored the map, just hiding it behind a public function.

MaciekPytel · 2022-04-13T17:58:26Z

cluster-autoscaler/core/scaledown/scaledown.go

+	// function are not guaranteed to be deleted, it is possible for the
+	// Actuator to ignore some of them e.g. if max configured level of
+	// parallelism is reached.
+	StartDeletion(nodes []*apiv1.Node, currentTime time.Time) (*status.ScaleDownStatus, errors.AutoscalerError)


I think the current definition goes beyond what I'd expect from Actuator (or maybe just the legacy implementation of it does?). I buy rate-limiting being part of actuation, but there is a lot of what I'd consider decision making in TryToScaleDown(). I don't think checking for scale-down-unneeded-time is really part of actuation and simulator.FindNodesToRemove is even less so.

I think even in the new scale-down proposal where we don't plan to redo simulation, there is a place for a function in Planner that would select subset of UnneededNodes that can actually be deleted in this loops (ex. because they've been unneeded for long enough). I'd lean towards thinking that a decision if a particular node is empty or needs drain also belongs in Planner and not Actuator.

Something like adding Planner.SelectNodesToScaleDown() (empty []*Node, needDrain []*Node, error) and Actuator.DeleteEmptyNodes + Actuator.DrainAndDeleteNodes would be a more intuitive split of responsibility to me. The obvious downside is that it would require some refactoring of TryToScaleDown(), but I think that's necessary anyway. We will want to re-use drain logic in new scale-down implementation, but we definitely don't want simulator.FindNodesToRemove in there.

Good point about the need to differentiate between uneeded and actually deletable nodes on the interface level, thanks! I just put the whole TryToScaleDown logic underneath this call because it contains a second simulation that will no longer be necessary in the new implementation - I wanted to limit changes to the existing implementation to avoid surprising behavior changes while the new logic is still enabled.

I tried to limit the API surface which is why I created just one function for handling both delete and drain&delete. You may be right in that I tried to put too much responsibility in the Actuator though. Perhaps just adding a parameter here would suffice though, updated.

MaciekPytel · 2022-04-13T18:08:20Z

cluster-autoscaler/core/scaledown/legacy/wrapper.go

+
+// ScaleDownWrapper wraps legacy scaledown logic to satisfy scaledown.Planner &
+// scaledown.Actuator interfaces.
+type ScaleDownWrapper struct {


I can see how this works for now, but I think we'll need to refactor TryToScaleDown anyway in order to re-use parts of it. Once we do, I'm not sure how much point there is to having a wrapper like this instead of just making scaleDown object compliant with the interface.
That being said a wrapper is a nice way to allow us to iterate on the implementation and not do entire refactor in a single humongous PR. I'm 100% fine with this PR being based on a wrapper and possibly dropping the wrapper after follow-up PRs make needed refactors.

Yeah, I introduced the wrapper to limit amount of code changes inside the existing ScaleDown implementation as this PR kept on growing. I think dropping the wrapper after splitting TryToScaleDown makes sense.

MaciekPytel · 2022-04-13T18:09:23Z

cluster-autoscaler/core/scaledown/legacy/wrapper.go

+// a fake node name instead.
+// TODO: Return real node names
+func (p *ScaleDownWrapper) DeletionsInProgress() []string {
+	if p.sd.nodeDeletionTracker.IsNonEmptyNodeDeleteInProgress() {


I think that eventually a shared implementation of Actuator should exist that takes over the ownership of nodeDeletionTracker.

Yes. I extended the interface to include a third object type: used for passing information between Planner and Actuator, let me know what you think. I consider separating the actuation to be shared between implementations a next step after initial version of the interface is merged.

MaciekPytel · 2022-04-13T18:15:29Z

cluster-autoscaler/core/scaledown/legacy/wrapper.go

+		podDestinations = allNodes
+	} else {
+		var err errors.AutoscalerError
+		scaleDownCandidates, err = p.processor.GetScaleDownCandidates(p.sd.context, allNodes)


My understanding is that we want to keep the ScaleDownNodeProcessor? If so, why push calling it into UpdateClusterState and not pass podDestinations and scaleDownCandidates as parameters?
The logic for calling processor is presumably the same in every implementation. Even if we won't need pre-filtering processor in new scale-down, the way to do that would be IMO to just not install the processor if new scale-up is used.

Yes, I suppose that is right, updated. My main goal here was to reduce # of calls to ScaleDown from 2 (CleanUp, UpdateUnneededNodes) to just 1, to simplify the interface.

MaciekPytel · 2022-04-13T18:18:52Z

cluster-autoscaler/core/static_autoscaler.go

 		// In dry run only utilization is updated
-		calculateUnneededOnly := scaleDownInCooldown || scaleDown.IsNonEmptyNodeDeleteInProgress()
+		calculateUnneededOnly := scaleDownInCooldown || deletionsInProgress > 0


Presumably the deletionsInProgress > 0 check is something that we'll actually need to push into legacy implementation? I'm fine if the answer is "yes, in a future PR" :)

Yes, in a future PR :)

Ultimately, I think it should be Planner's decision whether or not to proceed with deleting any nodes, so this will probably end up in some ShouldIStayOrShouldIGo function extending the new interface, or just implementation detail of the new NodesToDelete call.

MaciekPytel · 2022-04-21T17:44:56Z

/lgtm
/approve

/hold
I think we should not include this in 1.24.
I don't think the parallel scale-down implementation will be ready in time to be included in 1.24. As such it's hard for me to justify including a partial refactor with all the associated risk and no benefit.

I propose that we cut 1.24 branch soon (ideally discuss it on next sig-meeting and cut early next week) and merge this immediately after the branch is cut. WDYT?

k8s-ci-robot · 2022-04-21T17:45:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [MaciekPytel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

x13n · 2022-04-21T18:47:48Z

Thanks for reviewing! Yeah, I think it makes sense to me, I will un-hold it after 1.24 release is cut.

MaciekPytel · 2022-04-26T10:16:40Z

/lgtm
/hold cancel

I've created a release branch for CA 1.24 (after discussing it in sig meeting) and so merging this no longer introduces unnecessary risk to 1.24.

k8s-ci-robot assigned MaciekPytel Apr 11, 2022

k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Apr 11, 2022

k8s-ci-robot requested a review from jayantjain93 April 11, 2022 08:56

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 11, 2022

x13n force-pushed the sd-refactor branch from 46cd27e to 9fa78aa Compare April 11, 2022 09:41

MaciekPytel reviewed Apr 13, 2022

View reviewed changes

jbartosik added the area/cluster-autoscaler label Apr 14, 2022

x13n force-pushed the sd-refactor branch 2 times, most recently from 6fe224d to e325267 Compare April 15, 2022 19:31

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Apr 21, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2022

x13n mentioned this pull request Apr 25, 2022

Make NodeDeletionTracker implement ActuationStatus interface #4828

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2022

x13n added 9 commits April 26, 2022 08:48

Extract utilization info to a separate package

358f3a9

Remove direct access to ScaleDown fields

627284b

Stop referencing unneededNodes in static_autoscaler

a55135f

Move NodeDeletionTracker to a separate package

96c7eba

Move filter_out_schedulable to a separate package

e07fd1e

Extract core test utils to a separate package

4187e4c

Move existing ScaleDown code to a separate package

7686a1f

Move soft tainting logic to a separate package

5a78f49

Separate ScaleDown logic with a new interface

7f8b2da

x13n force-pushed the sd-refactor branch from e325267 to 7f8b2da Compare April 26, 2022 06:52

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 26, 2022

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 26, 2022

k8s-ci-robot merged commit 7fe1e45 into kubernetes:master Apr 26, 2022

kisieland mentioned this pull request Mar 15, 2023

Put ScaleUp logic behind an interface #5597

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Put ScaleDown logic behind an interface #4806

Put ScaleDown logic behind an interface #4806

x13n commented Apr 11, 2022

MaciekPytel Apr 13, 2022

x13n Apr 14, 2022

MaciekPytel Apr 15, 2022

MaciekPytel Apr 13, 2022

x13n Apr 14, 2022

MaciekPytel Apr 13, 2022

x13n Apr 14, 2022

MaciekPytel Apr 13, 2022

x13n Apr 14, 2022

MaciekPytel Apr 21, 2022

MaciekPytel Apr 13, 2022

x13n Apr 14, 2022

MaciekPytel Apr 13, 2022

x13n Apr 15, 2022

MaciekPytel Apr 13, 2022

x13n Apr 14, 2022

MaciekPytel Apr 13, 2022

x13n Apr 15, 2022

MaciekPytel Apr 13, 2022

x13n Apr 15, 2022

MaciekPytel Apr 13, 2022

x13n Apr 14, 2022

MaciekPytel commented Apr 21, 2022

k8s-ci-robot commented Apr 21, 2022

x13n commented Apr 21, 2022

MaciekPytel commented Apr 26, 2022

Put ScaleDown logic behind an interface #4806

Put ScaleDown logic behind an interface #4806

Conversation

x13n commented Apr 11, 2022

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaciekPytel commented Apr 21, 2022

k8s-ci-robot commented Apr 21, 2022

x13n commented Apr 21, 2022

MaciekPytel commented Apr 26, 2022