Nodegroup Interface for per-nodegroup configuration #3583

marwanad · 2020-10-06T22:21:32Z

Forking this off a thread in Slack.

Motivation behind this is to have some of the CA parameters operate on the nodegroup level. For example, have a different scale-down-utilization-threshold for every nodegroup instead of a cluster-wide setting.

From @marwanad:

I went through the parameters and a lot of them seem to be well-suite to be per nodegroup. After all, CA handles heterogeneous nodegroups well so it shouldn’t be a problem. The part up for discussion would probably be how to bubble down those parameters? Annotations on the node? Modify the node group spec to have an extra field? The last approach is a bit ugly so I think ideally we’d have a configmap of a sort for CA-related configs.

From @MaciekPytel:

i've also been thinking about this and i was actually considering going the route of adding a field to nodegroup interface; the reason for that is that:

way nodegroups are configured is generally provider specific (even --nodes flag have different format across providers, not to mention various autodiscovery implementations)

afaik autodiscovery is the most common way of configuring CA (admittedly i'm not familiar with all providers, so that may not be true across the board) - having nodegroup return the config would allow to extend existing autodiscovery setups to allow setting those values (for example via ASG tags); i think that would be most convenient to users, allowing to keep all configs related to nodegroup in the same place

The text was updated successfully, but these errors were encountered:

MaciekPytel · 2020-10-19T15:38:03Z

I've put some more thoughts on why I prefer cloudprovider-based implementation over configmap: #2989 (comment).

MaciekPytel · 2020-12-11T14:13:43Z

I'd like to pick this up and implement it for 1.21. Here is my proposal for how to implement this:

We would split AutoscalingOptions into two structs: NodeGroupAutoscalingOptions and AutoscalingOptions. The options that make sense on NodeGroup level (tentative list below) would be moved to NodeGroupAutoscalingOptions. AutoscalingOptions would contain NodeGroupAutoscalingOptions that will be set via existing flags and used as a default for any NodeGroup that doesn't provide a value for a given option.

type NodeGroupAutoscalingOptions struct {
  ScaleDownUtilizationThreshold    float64
  ScaleDownGpuUtilizationThreshold float64
  ScaleDownUnneededTime            time.Duration
  ...
}

type AutoscalingOptions struct {
  NodeGroupDefaults  NodeGroupAutoscalingOptions
  MaxEmptyBulkDelete int
  ...
}

We would introduce a new "processor" interface that will be used to determine the value of any NodeGroup level option. Anytime CA code requires the value of any of the options contained in NodeGroupAutoscalingOptions it will call the provided processors.

type NodeGroupOptionsProcessor interface {
  GetScaleDownUtilizationThreshold(context *context.AutoscalingContext, nodeGroup cloudprovider.NodeGroup) (float64, error)
  GetScaleDownGpuUtilizationThreshold(context *context.AutoscalingContext, nodeGroup cloudprovider.NodeGroup) (float64, error)
  GetScaleDownUnneededTime(context *context.AutoscalingContext, nodeGroup cloudprovider.NodeGroup) (time.Duration, error)
  ...
}

We will add a method to NodeGroup interface that would return NodeGroupAutoscalingOptions that will be used for a given NodeGroup.

A NodeGroup will be provided default options (in case it only wants to change some values it will be able to set other ones to default).
A NodeGroup will be able to return nil NodeGroupAutoscalingOptions to indicate no special settings are configured for this NodeGroup. In that case default values will be used for this NodeGroup.
Implementation of NodeGroup.GetOptions method will be optional. A trivial implementation returning nil will result in the current behavior of using default values (specified via flags) for all NodeGroups. Any provider will be free to integrate this feature with their autodiscovery scheme (or other preferred mechanism) at any time they want.

type NodeGroup interface {
  ...
  GetOptions(defaults NodeGroupAutoscalingOptions) (*NodeGroupAutoscalingOptions, error)
  ...
}

Default implementation of NodeGroupOptionsProcessor would rely on calling NodeGroup.GetOptions and filling in defaults as needed. If a user prefers a different way of configuring per-nodegroup settings (ex. via a configmap) they will be able to achieve it by swapping the processor implementation with a different one. This is the same pattern we use for any other advanced customization (ex. balancing similar NodeGroups).

Note that design allows customization at NodeGroup level and not at the level of individual nodes. I think this is a good thing for the reasons mentioned in #2989 (comment).

Appendix: per-nodegroup options

This is a tentative list of options that I think makes sense to be specified at NodeGroup level. The list is subject to change during implementation (if it turns out the logic of the particular option can't be easily changed to apply it per-nodegroup, etc).

Below settings are checked independently for each node making them easy to migrate. They're also the most commonly requested ones, so they must be moved to NodeGroupAutoscalingOptions:
- ScaleDownUtilizationThreshold
- ScaleDownGpuUtilizationThreshold
- ScaleDownUnneededTime
- ScaleDownUnreadyTime
Below setting make conceptual sense on nodegroup level, but the way they're implemented makes migration more difficult. I'll try to include them in initial implementation, but it depends on number of changes required:
- ScaleDownDelayAfterAdd
- ScaleDownDelayAfterDelete
- ScaleDownDelayAfterFailure

There are a few other settings that would be fairly easy to change to per-nodegroup, but I'm not convinced it makes conceptual sense (ex. IgnoreDaemonSetsUtilization), but I'm not yet sure if they make conceptual sense on NodeGroup level. I will look at them and extend this list as needed.

Another set of fields to potentially migrate are the ones related to how often a node is checked for scale-down (UnremovableNodeRecheckTimeout, etc). This is quite easy to set per-nodegroup, however, setting a value to low for one NodeGroup can starve scale-down in other groups, so I'm not sure if we want to allow that.

elmiko · 2020-12-14T14:14:31Z

this generally makes sense to me, i think the processor interface and split between the autoscaling options and per-node group options seems reasonable.

the main question i am left with is how will users be able to associate a set of options with a specific node group?

MaciekPytel · 2020-12-14T15:40:25Z

the main question i am left with is how will users be able to associate a set of options with a specific node group?

The idea is to leave this part up to each cloudprovider. This proposal is basically just a framework that will allow providers to expose the feature to users in whatever way makes most sense for the given provider.

The rationale is that the way in which you configure CA is already very provider specific:

Since deploying CA across different providers already requires completely different way of configuring it, there is less value in making just one part of config standardized.
What NodeGroup maps to and how NodeGroup ids are determined is different for each provider. So even with a standardized way to configure it (such as configmap) one would still need to know about implementation of a particular provider to configure this.

Instead each provider can choose to add those additional settings to their existing way of configuring NodeGroups. For example in case of AWS it could be via ASG tags similarly to how various scale-from-0 settings are configured (or not :) the decision on this is with AWS owners), with GCE it could be included in autodiscovery specs, etc. I'm not sufficiently familiar with other providers to make good suggestions. I tried to make the processor flexible enough to hopefully fit each provider's use-case.

elmiko · 2020-12-14T17:19:07Z

thanks @MaciekPytel , that makes good sense to me. i am wondering how users will interact with these mechanism too, would each provider be responsible for exposing flags or were you thinking about storing the configuration in some other way?

MaciekPytel · 2020-12-14T17:46:58Z

Each provider would be responsible for exposing a way to configure the feature to users and implementing NodeGroup.GetOptions() (or a completely different implementation of processor if needed). All the default implementation will do is poll NodeGroup.GetOptions() each loop and apply any provided overrides.

For a given provider doing nothing will result in no change in behavior - all NodeGroups will use default settings. At any point any provider interested in supporting the feature can add their implementation (this is the same pattern as scale-from-0).

elmiko · 2020-12-14T18:09:21Z

i think it would be useful if we could have a standard methodology for providers to reference their configuration data, for example a command line flag to a yaml file or something. just to help contain the possible matrix of options that providers could choose. but aside from that, i don't have an objection to what you are describing. thanks for the extra detail =)

MaciekPytel · 2020-12-29T16:43:06Z

Specifically for cluster api - wouldn't something like an annotation on MD or MS be the most convenient way of configuring those settings? I'm not saying it's the right way to do it, I'm just looking for your opinion.

Anyway, the 'framework' changes above are completely independent from any mechanism for actually setting the values, so I think it's safe to implement the mechanics first and later add a flag if we decide we want it.

This is the first step of implementing kubernetes#3583 (comment). New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.

This is the implementation of kubernetes#3583 (comment).

MaciekPytel · 2020-12-30T18:26:36Z

I've opened #3789 with the implementation.

elmiko · 2020-12-30T21:34:27Z

Specifically for cluster api - wouldn't something like an annotation on MD or MS be the most convenient way of configuring those settings? I'm not saying it's the right way to do it, I'm just looking for your opinion.

yeah, adding an annotation to MD or MS would be a very straightforward way to do this from the capi perspective and it would give users an easy way to interact as well.

Anyway, the 'framework' changes above are completely independent from any mechanism for actually setting the values, so I think it's safe to implement the mechanics first and later add a flag if we decide we want it.

agreed, makes sense to me!

This is the first step of implementing kubernetes#3583 (comment). New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.

This is the implementation of kubernetes#3583 (comment).

This is the first step of implementing kubernetes#3583 (comment). New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.

This is the implementation of kubernetes#3583 (comment).

This is the first step of implementing kubernetes#3583 (comment). New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.

This is the implementation of kubernetes#3583 (comment).

fejta-bot · 2021-03-30T21:52:01Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

This is the first step of implementing kubernetes#3583 (comment). New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.

This is the implementation of kubernetes#3583 (comment).

This is the first step of implementing kubernetes#3583 (comment). New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.

This is the implementation of kubernetes#3583 (comment).

This is the first step of implementing kubernetes#3583 (comment). New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.

This is the implementation of kubernetes#3583 (comment).

This is the first step of implementing kubernetes#3583 (comment). New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.

This is the implementation of kubernetes#3583 (comment).

MaciekPytel mentioned this issue Nov 26, 2020

NodeGroup specific resource limiter #3726

Closed

MaciekPytel added a commit to MaciekPytel/autoscaler that referenced this issue Dec 30, 2020

Per NodeGroup config for scale-down options

b8257de

This is the implementation of kubernetes#3583 (comment).

MaciekPytel mentioned this issue Dec 30, 2020

Per nodegroup scale-down config #3789

Merged

MaciekPytel added a commit to MaciekPytel/autoscaler that referenced this issue Dec 31, 2020

Per NodeGroup config for scale-down options

1b6c25e

This is the implementation of kubernetes#3583 (comment).

MaciekPytel added a commit to MaciekPytel/autoscaler that referenced this issue Jan 25, 2021

Per NodeGroup config for scale-down options

3e42b26

This is the implementation of kubernetes#3583 (comment).

ghost pushed a commit to Capillary/autoscaler that referenced this issue Jan 28, 2021

Per NodeGroup config for scale-down options

07a6625

This is the implementation of kubernetes#3583 (comment).

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 30, 2021

marwanad closed this as completed Mar 30, 2021

aksentyev pushed a commit to aksentyev/autoscaler that referenced this issue Apr 9, 2021

Per NodeGroup config for scale-down options

b4b92a2

This is the implementation of kubernetes#3583 (comment).

piotrnosek pushed a commit to piotrnosek/autoscaler that referenced this issue Nov 30, 2021

Per NodeGroup config for scale-down options

0035b16

This is the implementation of kubernetes#3583 (comment).

himanshu-kun pushed a commit to himanshu-kun/autoscaler that referenced this issue Apr 11, 2022

Per NodeGroup config for scale-down options

4310c91

This is the implementation of kubernetes#3583 (comment).

tim-smart pushed a commit to arisechurch/autoscaler that referenced this issue Nov 22, 2022

Per NodeGroup config for scale-down options

782a9d0

This is the implementation of kubernetes#3583 (comment).

himanshu-kun mentioned this issue Jun 28, 2023

Allow setting options per worker group gardener/autoscaler#240

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodegroup Interface for per-nodegroup configuration #3583

Nodegroup Interface for per-nodegroup configuration #3583

marwanad commented Oct 6, 2020

MaciekPytel commented Oct 19, 2020

MaciekPytel commented Dec 11, 2020

elmiko commented Dec 14, 2020

MaciekPytel commented Dec 14, 2020

elmiko commented Dec 14, 2020

MaciekPytel commented Dec 14, 2020

elmiko commented Dec 14, 2020

MaciekPytel commented Dec 29, 2020

MaciekPytel commented Dec 30, 2020

elmiko commented Dec 30, 2020

fejta-bot commented Mar 30, 2021

Nodegroup Interface for per-nodegroup configuration #3583

Nodegroup Interface for per-nodegroup configuration #3583

Comments

marwanad commented Oct 6, 2020

MaciekPytel commented Oct 19, 2020

MaciekPytel commented Dec 11, 2020

elmiko commented Dec 14, 2020

MaciekPytel commented Dec 14, 2020

elmiko commented Dec 14, 2020

MaciekPytel commented Dec 14, 2020

elmiko commented Dec 14, 2020

MaciekPytel commented Dec 29, 2020

MaciekPytel commented Dec 30, 2020

elmiko commented Dec 30, 2020

fejta-bot commented Mar 30, 2021