Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per nodegroup scale-down config #3789

Merged
merged 4 commits into from
Jan 25, 2021

Conversation

MaciekPytel
Copy link
Contributor

This PR implements #3583 (comment). It allows NodeGroups to override ScaleDownUnneededTime, ScaleDownUnreadyTime and utilization thresholds for nodes in that NodeGroup.

A new GetOptions method is added to cloudprovider interface. Its implementation is optional - returning ErrNotImplemented will result in the current behavior of using global config for all NodeGroups. A stub implementation returning ErrNotImplemented was added to all providers.

Note that this PR is only a framework for implementing the actual feature - there is no implementation for how the user would specify the options. Any provider is free to implement GetOptions (and the way to configure it) as they like.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 30, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 30, 2020
@MaciekPytel
Copy link
Contributor Author

/hold
I want to add some more unittests.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 30, 2020
@MaciekPytel
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 31, 2020
MaxEmptyBulkDelete int
// NodeGroupAutoscalingOptions contain various options to customize how autoscaling of
// a given NodeGroup works. Different options can be used for each NodeGroup.
type NodeGroupAutoscalingOptions struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have a dumb question but:

  • Why only these 4 scale down parameters ?
    ScaleDownDelayAfterAdd, ScaleDownDelayAfterDelete and so on, are not applicable for this new configuration method ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those 4 values were the low-hanging fruits - they required relatively small changes in the code and they were by far the most commonly requested values to control per nodegroup.

I think ultimately it makes sense to have ScaleDownDelayAfter* as part of NodeGroupAutoscalingOptions, but it is much more tricky to do (details below) and I didn't want to block this PR on it. I'd rather take an iterative approach of gradually moving more options to NodeGroupAutoscalingOptions in separate PRs (though I don't want to make any promises on when I will be able to actually implement it for ScaleDownDelay flags; I'm also happy to just review if someone else feels like taking a shot at it).


Details:

ScaleDownDelayAfter* are currently implemented as a global check that result in completely skipping scale-down logic] and not as a per-node check like the 4 values in the new config. Having them per-nodegroup would require completely re-implementing them and figuring out some non-obvious questions about semantics: how should per-nodegroup ScaleDownDelayAfterDelete work? It obviously blocks one nodegroup from scaling-down, but should it be triggered only when node from that nodegroup is deleted or when any node in cluster is deleted? Same goes for ScaleDownDelayAfterAdd. It seems to me that depending on use-case you may want either behavior (block on any node - if your pods can schedule on multiple nodegroups and you just want CA to pick the best one, only block on that nodegroup if you have a dedicated nodegroup per workload and each pod can generally only go to 1 nodegroup).

Copy link
Contributor

@ricoleabricot ricoleabricot Jan 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see, it's rather clear to me that CA should be first updated with "simple" modification first.
As for ScaleDownDelayAfter*, I understand the impact and the amount of source code edit behind, and the semantics questions. I may have a "dumb" solution but : what about adding a ScaleDownAfterBehaviour, with global effect and local or something ?
Anyway, thanks for the hard work, I really appreciate the effort to add some per node-group config so quickly, it will really boost our autoscaling configuration for our managed solution 🚀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a dumb solution. I was considering a pretty much equivalent solution - keep the current scale-down-delay-after-* flags and add scale-down-delay-after-node-group-add/delete/etc (the name could use some work).
This gives maximum flexibility at the cost of adding even more complexity for the user. I guess it may not be a problem if you have a hosted solution, but configuring CA is already pretty hard partly due to a huge number of flags we have. Still it may be the best solution.

Whatever solution we decide on I'd rather do it in a separate PR though. I think this one is complex enough as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever solution we decide on I'd rather do it in a separate PR though. I think this one is complex enough as is.

And I agree :)

@MaciekPytel
Copy link
Contributor Author

@towca Addressed your comment. I put the changes in a separate commit, since the fixes apply to different commits and amending them all would get tricky.

@towca
Copy link
Collaborator

towca commented Jan 25, 2021

Thanks for addressing the comments! LGTM, but could you take a look at the second part of the remaining comment? If it's ok feel free to remove the hold.

/lgtm
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 25, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2021
This is the first step of implementing
kubernetes#3583 (comment).
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2021
@towca
Copy link
Collaborator

towca commented Jan 25, 2021

/lgtm
/unhold

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jan 25, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2021
@towca
Copy link
Collaborator

towca commented Jan 25, 2021

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants