Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodegroup Interface for per-nodegroup configuration #3583

Closed
marwanad opened this issue Oct 6, 2020 · 11 comments
Closed

Nodegroup Interface for per-nodegroup configuration #3583

marwanad opened this issue Oct 6, 2020 · 11 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@marwanad
Copy link
Member

marwanad commented Oct 6, 2020

Forking this off a thread in Slack.

Motivation behind this is to have some of the CA parameters operate on the nodegroup level. For example, have a different scale-down-utilization-threshold for every nodegroup instead of a cluster-wide setting.

From @marwanad:

I went through the parameters and a lot of them seem to be well-suite to be per nodegroup. After all, CA handles heterogeneous nodegroups well so it shouldn’t be a problem. The part up for discussion would probably be how to bubble down those parameters? Annotations on the node? Modify the node group spec to have an extra field? The last approach is a bit ugly so I think ideally we’d have a configmap of a sort for CA-related configs.

From @MaciekPytel:

i've also been thinking about this and i was actually considering going the route of adding a field to nodegroup interface; the reason for that is that:

  1. way nodegroups are configured is generally provider specific (even --nodes flag have different format across providers, not to mention various autodiscovery implementations)
  2. afaik autodiscovery is the most common way of configuring CA (admittedly i'm not familiar with all providers, so that may not be true across the board) - having nodegroup return the config would allow to extend existing autodiscovery setups to allow setting those values (for example via ASG tags); i think that would be most convenient to users, allowing to keep all configs related to nodegroup in the same place
@MaciekPytel
Copy link
Contributor

I've put some more thoughts on why I prefer cloudprovider-based implementation over configmap: #2989 (comment).

@MaciekPytel
Copy link
Contributor

I'd like to pick this up and implement it for 1.21. Here is my proposal for how to implement this:

  1. We would split AutoscalingOptions into two structs: NodeGroupAutoscalingOptions and AutoscalingOptions. The options that make sense on NodeGroup level (tentative list below) would be moved to NodeGroupAutoscalingOptions. AutoscalingOptions would contain NodeGroupAutoscalingOptions that will be set via existing flags and used as a default for any NodeGroup that doesn't provide a value for a given option.
type NodeGroupAutoscalingOptions struct {
  ScaleDownUtilizationThreshold    float64
  ScaleDownGpuUtilizationThreshold float64
  ScaleDownUnneededTime            time.Duration
  ...
}

type AutoscalingOptions struct {
  NodeGroupDefaults  NodeGroupAutoscalingOptions
  MaxEmptyBulkDelete int
  ...
}
  1. We would introduce a new "processor" interface that will be used to determine the value of any NodeGroup level option. Anytime CA code requires the value of any of the options contained in NodeGroupAutoscalingOptions it will call the provided processors.
type NodeGroupOptionsProcessor interface {
  GetScaleDownUtilizationThreshold(context *context.AutoscalingContext, nodeGroup cloudprovider.NodeGroup) (float64, error)
  GetScaleDownGpuUtilizationThreshold(context *context.AutoscalingContext, nodeGroup cloudprovider.NodeGroup) (float64, error)
  GetScaleDownUnneededTime(context *context.AutoscalingContext, nodeGroup cloudprovider.NodeGroup) (time.Duration, error)
  ...
}
  1. We will add a method to NodeGroup interface that would return NodeGroupAutoscalingOptions that will be used for a given NodeGroup.
  • A NodeGroup will be provided default options (in case it only wants to change some values it will be able to set other ones to default).
  • A NodeGroup will be able to return nil NodeGroupAutoscalingOptions to indicate no special settings are configured for this NodeGroup. In that case default values will be used for this NodeGroup.
  • Implementation of NodeGroup.GetOptions method will be optional. A trivial implementation returning nil will result in the current behavior of using default values (specified via flags) for all NodeGroups. Any provider will be free to integrate this feature with their autodiscovery scheme (or other preferred mechanism) at any time they want.
type NodeGroup interface {
  ...
  GetOptions(defaults NodeGroupAutoscalingOptions) (*NodeGroupAutoscalingOptions, error)
  ...
}
  1. Default implementation of NodeGroupOptionsProcessor would rely on calling NodeGroup.GetOptions and filling in defaults as needed. If a user prefers a different way of configuring per-nodegroup settings (ex. via a configmap) they will be able to achieve it by swapping the processor implementation with a different one. This is the same pattern we use for any other advanced customization (ex. balancing similar NodeGroups).

Note that design allows customization at NodeGroup level and not at the level of individual nodes. I think this is a good thing for the reasons mentioned in #2989 (comment).


Appendix: per-nodegroup options

This is a tentative list of options that I think makes sense to be specified at NodeGroup level. The list is subject to change during implementation (if it turns out the logic of the particular option can't be easily changed to apply it per-nodegroup, etc).

  • Below settings are checked independently for each node making them easy to migrate. They're also the most commonly requested ones, so they must be moved to NodeGroupAutoscalingOptions:
    • ScaleDownUtilizationThreshold
    • ScaleDownGpuUtilizationThreshold
    • ScaleDownUnneededTime
    • ScaleDownUnreadyTime
  • Below setting make conceptual sense on nodegroup level, but the way they're implemented makes migration more difficult. I'll try to include them in initial implementation, but it depends on number of changes required:
    • ScaleDownDelayAfterAdd
    • ScaleDownDelayAfterDelete
    • ScaleDownDelayAfterFailure

There are a few other settings that would be fairly easy to change to per-nodegroup, but I'm not convinced it makes conceptual sense (ex. IgnoreDaemonSetsUtilization), but I'm not yet sure if they make conceptual sense on NodeGroup level. I will look at them and extend this list as needed.

Another set of fields to potentially migrate are the ones related to how often a node is checked for scale-down (UnremovableNodeRecheckTimeout, etc). This is quite easy to set per-nodegroup, however, setting a value to low for one NodeGroup can starve scale-down in other groups, so I'm not sure if we want to allow that.

@elmiko
Copy link
Contributor

elmiko commented Dec 14, 2020

this generally makes sense to me, i think the processor interface and split between the autoscaling options and per-node group options seems reasonable.

the main question i am left with is how will users be able to associate a set of options with a specific node group?

@MaciekPytel
Copy link
Contributor

the main question i am left with is how will users be able to associate a set of options with a specific node group?

The idea is to leave this part up to each cloudprovider. This proposal is basically just a framework that will allow providers to expose the feature to users in whatever way makes most sense for the given provider.

The rationale is that the way in which you configure CA is already very provider specific:

  • Since deploying CA across different providers already requires completely different way of configuring it, there is less value in making just one part of config standardized.
  • What NodeGroup maps to and how NodeGroup ids are determined is different for each provider. So even with a standardized way to configure it (such as configmap) one would still need to know about implementation of a particular provider to configure this.

Instead each provider can choose to add those additional settings to their existing way of configuring NodeGroups. For example in case of AWS it could be via ASG tags similarly to how various scale-from-0 settings are configured (or not :) the decision on this is with AWS owners), with GCE it could be included in autodiscovery specs, etc. I'm not sufficiently familiar with other providers to make good suggestions. I tried to make the processor flexible enough to hopefully fit each provider's use-case.

@elmiko
Copy link
Contributor

elmiko commented Dec 14, 2020

thanks @MaciekPytel , that makes good sense to me. i am wondering how users will interact with these mechanism too, would each provider be responsible for exposing flags or were you thinking about storing the configuration in some other way?

@MaciekPytel
Copy link
Contributor

Each provider would be responsible for exposing a way to configure the feature to users and implementing NodeGroup.GetOptions() (or a completely different implementation of processor if needed). All the default implementation will do is poll NodeGroup.GetOptions() each loop and apply any provided overrides.

For a given provider doing nothing will result in no change in behavior - all NodeGroups will use default settings. At any point any provider interested in supporting the feature can add their implementation (this is the same pattern as scale-from-0).

@elmiko
Copy link
Contributor

elmiko commented Dec 14, 2020

i think it would be useful if we could have a standard methodology for providers to reference their configuration data, for example a command line flag to a yaml file or something. just to help contain the possible matrix of options that providers could choose. but aside from that, i don't have an objection to what you are describing. thanks for the extra detail =)

@MaciekPytel
Copy link
Contributor

Specifically for cluster api - wouldn't something like an annotation on MD or MS be the most convenient way of configuring those settings? I'm not saying it's the right way to do it, I'm just looking for your opinion.

Anyway, the 'framework' changes above are completely independent from any mechanism for actually setting the values, so I think it's safe to implement the mechanics first and later add a flag if we decide we want it.

MaciekPytel added a commit to MaciekPytel/autoscaler that referenced this issue Dec 30, 2020
This is the first step of implementing
kubernetes#3583 (comment).
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
MaciekPytel added a commit to MaciekPytel/autoscaler that referenced this issue Dec 30, 2020
@MaciekPytel
Copy link
Contributor

I've opened #3789 with the implementation.

@elmiko
Copy link
Contributor

elmiko commented Dec 30, 2020

Specifically for cluster api - wouldn't something like an annotation on MD or MS be the most convenient way of configuring those settings? I'm not saying it's the right way to do it, I'm just looking for your opinion.

yeah, adding an annotation to MD or MS would be a very straightforward way to do this from the capi perspective and it would give users an easy way to interact as well.

Anyway, the 'framework' changes above are completely independent from any mechanism for actually setting the values, so I think it's safe to implement the mechanics first and later add a flag if we decide we want it.

agreed, makes sense to me!

MaciekPytel added a commit to MaciekPytel/autoscaler that referenced this issue Dec 31, 2020
This is the first step of implementing
kubernetes#3583 (comment).
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
MaciekPytel added a commit to MaciekPytel/autoscaler that referenced this issue Dec 31, 2020
MaciekPytel added a commit to MaciekPytel/autoscaler that referenced this issue Jan 25, 2021
This is the first step of implementing
kubernetes#3583 (comment).
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
MaciekPytel added a commit to MaciekPytel/autoscaler that referenced this issue Jan 25, 2021
ghost pushed a commit to Capillary/autoscaler that referenced this issue Jan 28, 2021
This is the first step of implementing
kubernetes#3583 (comment).
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
ghost pushed a commit to Capillary/autoscaler that referenced this issue Jan 28, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 30, 2021
aksentyev pushed a commit to aksentyev/autoscaler that referenced this issue Apr 9, 2021
This is the first step of implementing
kubernetes#3583 (comment).
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
aksentyev pushed a commit to aksentyev/autoscaler that referenced this issue Apr 9, 2021
piotrnosek pushed a commit to piotrnosek/autoscaler that referenced this issue Nov 30, 2021
This is the first step of implementing
kubernetes#3583 (comment).
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
piotrnosek pushed a commit to piotrnosek/autoscaler that referenced this issue Nov 30, 2021
himanshu-kun pushed a commit to himanshu-kun/autoscaler that referenced this issue Apr 11, 2022
This is the first step of implementing
kubernetes#3583 (comment).
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
himanshu-kun pushed a commit to himanshu-kun/autoscaler that referenced this issue Apr 11, 2022
tim-smart pushed a commit to arisechurch/autoscaler that referenced this issue Nov 22, 2022
This is the first step of implementing
kubernetes#3583 (comment).
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
tim-smart pushed a commit to arisechurch/autoscaler that referenced this issue Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants