Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put ScaleDown logic behind an interface #4806

Merged
merged 9 commits into from
Apr 26, 2022

Conversation

x13n
Copy link
Member

@x13n x13n commented Apr 11, 2022

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This refactoring is a prerequisite for implementing parallel scale down, see #4766

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

This is a huge change and can be hard to review. I divided it into 10 commits that are self-contained (i.e. tests were passing on each commit). Looking at the changes may be simpler one commit at a time.

Does this PR introduce a user-facing change?

ScaleDown logic was significantly refactored. This should have no visible impact on the behavior of Cluster Autoscaler.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- Design doc: https://github.com/kubernetes/autoscaler/blob/2284649f5aa3ba9813779cf390c49100e9c1a9be/cluster-autoscaler/proposals/parallel_drain.md

/assign @MaciekPytel
/cc @jayantjain93

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Apr 11, 2022
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 11, 2022
limitations under the License.
*/

package actuation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it wouldn't be better to move this logic to utils/deletetaint or elsewhere outside the core/? It's pretty much impossible to import core/ from outside of it. I can't really think of a reason for reusing DeletionCandidate soft taints, but a more generic rate-limiting for tainting seems like a good feature to have (especially if we can rate-limit the total tainting QPS, ie. soft taint less if we're doing a lot of hard tainting).

I'm not really sure if any of the above is worth the effort and I don't think it's required from this PR, just throwing around ideas to consider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a separate package from core (even though it is a subdir), so I think it should be easier to reuse or make improvements now. The tests still depend on core test utils though, which depends on deletetaint, so it cannot be merged there - it would introduce a cyclic dependency. I'd leave it here for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

}

func (b *budgetTracker) processWithinBudget(f func()) {
klog.Infof("now = %v, startTime = %v, timeBudget = %v", now(), b.startTime, b.timeBudget)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a leftover debug log? If you want to keep it please use V(). Also, consider re-wording as it is will be very unclear when this log is coming from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was indeed a debug log, removed.


func (b *budgetTracker) reportExceededLimits() {
if b.skippedNodes > 0 {
klog.V(4).Infof("Skipped adding/removing soft taints on %v nodes - API call limit exceeded", b.skippedNodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this may happen as a result of either API call limit or time limit being exceeded, not sure if it's worth trying to clarify in the log line though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was carried over from existing code, but you're right - it was incorrect. Clarified now.

@@ -74,6 +75,7 @@ type StaticAutoscaler struct {
processorCallbacks *staticAutoscalerProcessorCallbacks
initialized bool
ignoredTaints taints.TaintKeySet
utilizationTracker *utilization.Tracker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're making utilizationTracker into an interface and exposing it outside of scale-down, maybe a better place to do it would be to keep it in context? This would give access to it across the codebase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need access across the codebase? I put this here so that scaleDownStatus can be updated without accessing private fields from ScaleDown. An alternative would be to make a public method in ScaleDown to access the map, but then we'd need to make that method a part of the interface. Perhaps that'd be actually cleaner than sharing the object under the hood.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with @x13n. I think we agree that it would be beneficial, for example, in order to produce utilization metrics. It's not in the scope of this PR though and can be done as a separate change later on.

package utilization

// Tracker allows storing and fetching utilization info for a set of nodes
type Tracker struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're extracting utilization data from scale-down to the tracker, maybe we should also make tracker the owner of the calculation? I don't really see any advantage of this implementation over just using type Tracker map[string]Info and the core scale-down logic is still responsible for building an actual map like this.

The entire logic of checkNodeUtilization is clearly not a fit for this, but having the Tracker own CalculateUtilization would actually encapsulate some parts of implementation (ex. looking at context.IgnoreDaemonSetsUtilization). We could achieve this with minimal code change to scale-down if the Tracker would calculate the utilization for each node when it's first requested (ie. lazy calculation + cache). Or we could have it pre-calculate it for all nodes - I know we don't calculate utilization for all nodes today, but it's cheap logic and I see no strong reason against it (but also no particular benefit over lazy calculation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I wrapped the map into a struct is that the ownership moved from ScaleDown to StaticAutoscaler and the ScaleDown replaces the whole map at times, which would make the map kept by StaticAutoscaler obsolete. Now that I think about it, I may have overengineered this a bit. Restored the map, just hiding it behind a public function.

// function are not guaranteed to be deleted, it is possible for the
// Actuator to ignore some of them e.g. if max configured level of
// parallelism is reached.
StartDeletion(nodes []*apiv1.Node, currentTime time.Time) (*status.ScaleDownStatus, errors.AutoscalerError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current definition goes beyond what I'd expect from Actuator (or maybe just the legacy implementation of it does?). I buy rate-limiting being part of actuation, but there is a lot of what I'd consider decision making in TryToScaleDown(). I don't think checking for scale-down-unneeded-time is really part of actuation and simulator.FindNodesToRemove is even less so.

I think even in the new scale-down proposal where we don't plan to redo simulation, there is a place for a function in Planner that would select subset of UnneededNodes that can actually be deleted in this loops (ex. because they've been unneeded for long enough). I'd lean towards thinking that a decision if a particular node is empty or needs drain also belongs in Planner and not Actuator.

Something like adding Planner.SelectNodesToScaleDown() (empty []*Node, needDrain []*Node, error) and Actuator.DeleteEmptyNodes + Actuator.DrainAndDeleteNodes would be a more intuitive split of responsibility to me. The obvious downside is that it would require some refactoring of TryToScaleDown(), but I think that's necessary anyway. We will want to re-use drain logic in new scale-down implementation, but we definitely don't want simulator.FindNodesToRemove in there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the need to differentiate between uneeded and actually deletable nodes on the interface level, thanks! I just put the whole TryToScaleDown logic underneath this call because it contains a second simulation that will no longer be necessary in the new implementation - I wanted to limit changes to the existing implementation to avoid surprising behavior changes while the new logic is still enabled.

I tried to limit the API surface which is why I created just one function for handling both delete and drain&delete. You may be right in that I tried to put too much responsibility in the Actuator though. Perhaps just adding a parameter here would suffice though, updated.


// ScaleDownWrapper wraps legacy scaledown logic to satisfy scaledown.Planner &
// scaledown.Actuator interfaces.
type ScaleDownWrapper struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this works for now, but I think we'll need to refactor TryToScaleDown anyway in order to re-use parts of it. Once we do, I'm not sure how much point there is to having a wrapper like this instead of just making scaleDown object compliant with the interface.
That being said a wrapper is a nice way to allow us to iterate on the implementation and not do entire refactor in a single humongous PR. I'm 100% fine with this PR being based on a wrapper and possibly dropping the wrapper after follow-up PRs make needed refactors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I introduced the wrapper to limit amount of code changes inside the existing ScaleDown implementation as this PR kept on growing. I think dropping the wrapper after splitting TryToScaleDown makes sense.

// a fake node name instead.
// TODO: Return real node names
func (p *ScaleDownWrapper) DeletionsInProgress() []string {
if p.sd.nodeDeletionTracker.IsNonEmptyNodeDeleteInProgress() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that eventually a shared implementation of Actuator should exist that takes over the ownership of nodeDeletionTracker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I extended the interface to include a third object type: used for passing information between Planner and Actuator, let me know what you think. I consider separating the actuation to be shared between implementations a next step after initial version of the interface is merged.

podDestinations = allNodes
} else {
var err errors.AutoscalerError
scaleDownCandidates, err = p.processor.GetScaleDownCandidates(p.sd.context, allNodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that we want to keep the ScaleDownNodeProcessor? If so, why push calling it into UpdateClusterState and not pass podDestinations and scaleDownCandidates as parameters?
The logic for calling processor is presumably the same in every implementation. Even if we won't need pre-filtering processor in new scale-down, the way to do that would be IMO to just not install the processor if new scale-up is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose that is right, updated. My main goal here was to reduce # of calls to ScaleDown from 2 (CleanUp, UpdateUnneededNodes) to just 1, to simplify the interface.

// In dry run only utilization is updated
calculateUnneededOnly := scaleDownInCooldown || scaleDown.IsNonEmptyNodeDeleteInProgress()
calculateUnneededOnly := scaleDownInCooldown || deletionsInProgress > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably the deletionsInProgress > 0 check is something that we'll actually need to push into legacy implementation? I'm fine if the answer is "yes, in a future PR" :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in a future PR :)

Ultimately, I think it should be Planner's decision whether or not to proceed with deleting any nodes, so this will probably end up in some ShouldIStayOrShouldIGo function extending the new interface, or just implementation detail of the new NodesToDelete call.

@x13n x13n force-pushed the sd-refactor branch 2 times, most recently from 6fe224d to e325267 Compare April 15, 2022 19:31
@MaciekPytel
Copy link
Contributor

/lgtm
/approve

/hold
I think we should not include this in 1.24.
I don't think the parallel scale-down implementation will be ready in time to be included in 1.24. As such it's hard for me to justify including a partial refactor with all the associated risk and no benefit.

I propose that we cut 1.24 branch soon (ideally discuss it on next sig-meeting and cut early next week) and merge this immediately after the branch is cut. WDYT?

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Apr 21, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2022
@x13n
Copy link
Member Author

x13n commented Apr 21, 2022

Thanks for reviewing! Yeah, I think it makes sense to me, I will un-hold it after 1.24 release is cut.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2022
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 26, 2022
@MaciekPytel
Copy link
Contributor

/lgtm
/hold cancel

I've created a release branch for CA 1.24 (after discussing it in sig meeting) and so merging this no longer introduces unnecessary risk to 1.24.

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 26, 2022
@k8s-ci-robot k8s-ci-robot merged commit 7fe1e45 into kubernetes:master Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants