-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put ScaleDown logic behind an interface #4806
Conversation
limitations under the License. | ||
*/ | ||
|
||
package actuation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it wouldn't be better to move this logic to utils/deletetaint or elsewhere outside the core/? It's pretty much impossible to import core/ from outside of it. I can't really think of a reason for reusing DeletionCandidate soft taints, but a more generic rate-limiting for tainting seems like a good feature to have (especially if we can rate-limit the total tainting QPS, ie. soft taint less if we're doing a lot of hard tainting).
I'm not really sure if any of the above is worth the effort and I don't think it's required from this PR, just throwing around ideas to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a separate package from core (even though it is a subdir), so I think it should be easier to reuse or make improvements now. The tests still depend on core test utils though, which depends on deletetaint, so it cannot be merged there - it would introduce a cyclic dependency. I'd leave it here for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
} | ||
|
||
func (b *budgetTracker) processWithinBudget(f func()) { | ||
klog.Infof("now = %v, startTime = %v, timeBudget = %v", now(), b.startTime, b.timeBudget) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a leftover debug log? If you want to keep it please use V(). Also, consider re-wording as it is will be very unclear when this log is coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was indeed a debug log, removed.
|
||
func (b *budgetTracker) reportExceededLimits() { | ||
if b.skippedNodes > 0 { | ||
klog.V(4).Infof("Skipped adding/removing soft taints on %v nodes - API call limit exceeded", b.skippedNodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this may happen as a result of either API call limit or time limit being exceeded, not sure if it's worth trying to clarify in the log line though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was carried over from existing code, but you're right - it was incorrect. Clarified now.
@@ -74,6 +75,7 @@ type StaticAutoscaler struct { | |||
processorCallbacks *staticAutoscalerProcessorCallbacks | |||
initialized bool | |||
ignoredTaints taints.TaintKeySet | |||
utilizationTracker *utilization.Tracker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're making utilizationTracker into an interface and exposing it outside of scale-down, maybe a better place to do it would be to keep it in context? This would give access to it across the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need access across the codebase? I put this here so that scaleDownStatus can be updated without accessing private fields from ScaleDown. An alternative would be to make a public method in ScaleDown to access the map, but then we'd need to make that method a part of the interface. Perhaps that'd be actually cleaner than sharing the object under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @x13n. I think we agree that it would be beneficial, for example, in order to produce utilization metrics. It's not in the scope of this PR though and can be done as a separate change later on.
package utilization | ||
|
||
// Tracker allows storing and fetching utilization info for a set of nodes | ||
type Tracker struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're extracting utilization data from scale-down to the tracker, maybe we should also make tracker the owner of the calculation? I don't really see any advantage of this implementation over just using type Tracker map[string]Info
and the core scale-down logic is still responsible for building an actual map like this.
The entire logic of checkNodeUtilization is clearly not a fit for this, but having the Tracker own CalculateUtilization would actually encapsulate some parts of implementation (ex. looking at context.IgnoreDaemonSetsUtilization). We could achieve this with minimal code change to scale-down if the Tracker would calculate the utilization for each node when it's first requested (ie. lazy calculation + cache). Or we could have it pre-calculate it for all nodes - I know we don't calculate utilization for all nodes today, but it's cheap logic and I see no strong reason against it (but also no particular benefit over lazy calculation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I wrapped the map into a struct is that the ownership moved from ScaleDown to StaticAutoscaler and the ScaleDown replaces the whole map at times, which would make the map kept by StaticAutoscaler obsolete. Now that I think about it, I may have overengineered this a bit. Restored the map, just hiding it behind a public function.
// function are not guaranteed to be deleted, it is possible for the | ||
// Actuator to ignore some of them e.g. if max configured level of | ||
// parallelism is reached. | ||
StartDeletion(nodes []*apiv1.Node, currentTime time.Time) (*status.ScaleDownStatus, errors.AutoscalerError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current definition goes beyond what I'd expect from Actuator (or maybe just the legacy implementation of it does?). I buy rate-limiting being part of actuation, but there is a lot of what I'd consider decision making in TryToScaleDown(). I don't think checking for scale-down-unneeded-time
is really part of actuation and simulator.FindNodesToRemove
is even less so.
I think even in the new scale-down proposal where we don't plan to redo simulation, there is a place for a function in Planner that would select subset of UnneededNodes that can actually be deleted in this loops (ex. because they've been unneeded for long enough). I'd lean towards thinking that a decision if a particular node is empty or needs drain also belongs in Planner and not Actuator.
Something like adding Planner.SelectNodesToScaleDown() (empty []*Node, needDrain []*Node, error)
and Actuator.DeleteEmptyNodes
+ Actuator.DrainAndDeleteNodes
would be a more intuitive split of responsibility to me. The obvious downside is that it would require some refactoring of TryToScaleDown(), but I think that's necessary anyway. We will want to re-use drain logic in new scale-down implementation, but we definitely don't want simulator.FindNodesToRemove
in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the need to differentiate between uneeded and actually deletable nodes on the interface level, thanks! I just put the whole TryToScaleDown logic underneath this call because it contains a second simulation that will no longer be necessary in the new implementation - I wanted to limit changes to the existing implementation to avoid surprising behavior changes while the new logic is still enabled.
I tried to limit the API surface which is why I created just one function for handling both delete and drain&delete. You may be right in that I tried to put too much responsibility in the Actuator though. Perhaps just adding a parameter here would suffice though, updated.
|
||
// ScaleDownWrapper wraps legacy scaledown logic to satisfy scaledown.Planner & | ||
// scaledown.Actuator interfaces. | ||
type ScaleDownWrapper struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how this works for now, but I think we'll need to refactor TryToScaleDown anyway in order to re-use parts of it. Once we do, I'm not sure how much point there is to having a wrapper like this instead of just making scaleDown object compliant with the interface.
That being said a wrapper is a nice way to allow us to iterate on the implementation and not do entire refactor in a single humongous PR. I'm 100% fine with this PR being based on a wrapper and possibly dropping the wrapper after follow-up PRs make needed refactors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I introduced the wrapper to limit amount of code changes inside the existing ScaleDown implementation as this PR kept on growing. I think dropping the wrapper after splitting TryToScaleDown makes sense.
// a fake node name instead. | ||
// TODO: Return real node names | ||
func (p *ScaleDownWrapper) DeletionsInProgress() []string { | ||
if p.sd.nodeDeletionTracker.IsNonEmptyNodeDeleteInProgress() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that eventually a shared implementation of Actuator should exist that takes over the ownership of nodeDeletionTracker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I extended the interface to include a third object type: used for passing information between Planner and Actuator, let me know what you think. I consider separating the actuation to be shared between implementations a next step after initial version of the interface is merged.
podDestinations = allNodes | ||
} else { | ||
var err errors.AutoscalerError | ||
scaleDownCandidates, err = p.processor.GetScaleDownCandidates(p.sd.context, allNodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we want to keep the ScaleDownNodeProcessor? If so, why push calling it into UpdateClusterState and not pass podDestinations and scaleDownCandidates as parameters?
The logic for calling processor is presumably the same in every implementation. Even if we won't need pre-filtering processor in new scale-down, the way to do that would be IMO to just not install the processor if new scale-up is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suppose that is right, updated. My main goal here was to reduce # of calls to ScaleDown from 2 (CleanUp, UpdateUnneededNodes) to just 1, to simplify the interface.
// In dry run only utilization is updated | ||
calculateUnneededOnly := scaleDownInCooldown || scaleDown.IsNonEmptyNodeDeleteInProgress() | ||
calculateUnneededOnly := scaleDownInCooldown || deletionsInProgress > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably the deletionsInProgress > 0 check is something that we'll actually need to push into legacy implementation? I'm fine if the answer is "yes, in a future PR" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in a future PR :)
Ultimately, I think it should be Planner's decision whether or not to proceed with deleting any nodes, so this will probably end up in some ShouldIStayOrShouldIGo function extending the new interface, or just implementation detail of the new NodesToDelete call.
6fe224d
to
e325267
Compare
/lgtm /hold I propose that we cut 1.24 branch soon (ideally discuss it on next sig-meeting and cut early next week) and merge this immediately after the branch is cut. WDYT? |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MaciekPytel, x13n The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks for reviewing! Yeah, I think it makes sense to me, I will un-hold it after 1.24 release is cut. |
/lgtm I've created a release branch for CA 1.24 (after discussing it in sig meeting) and so merging this no longer introduces unnecessary risk to 1.24. |
Which component this PR applies to?
cluster-autoscaler
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This refactoring is a prerequisite for implementing parallel scale down, see #4766
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
This is a huge change and can be hard to review. I divided it into 10 commits that are self-contained (i.e. tests were passing on each commit). Looking at the changes may be simpler one commit at a time.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/assign @MaciekPytel
/cc @jayantjain93