-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 Add cluster autoscaler scale from zero ux proposal #2530
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,117 @@ | ||||||
--- | ||||||
title: Cluster Autoscaler Scale from Zero UX | ||||||
authors: | ||||||
- "@michaelgugino" | ||||||
reviewers: | ||||||
- "@detiber" | ||||||
creation-date: 2020-03-04 | ||||||
creation-date: 2020-03-04 | ||||||
status: | ||||||
--- | ||||||
|
||||||
# Cluster Autoscaler Scale from Zero UX | ||||||
|
||||||
## Summary | ||||||
Cluster Autoscaler supports scale from zero for cloud-native providers. This is done via compiling CPU, GPU, and RAM info into the cluster autoscaler at build time for each supported cloud. | ||||||
|
||||||
We are attempting to add cluster-api support to the cluster-autoscaler to scale machinesets, machinedeployments, and/or other groupings of machines as appropriate. We are trying to keep an abstraction which is isolated from particular cloud implementations. We need a way to inform the cluster-autoscaler of the value of machine resources so it can make proper scheduling decisions when there are not currently any machines in a particular scale set (machineset/machinedeployment, etc). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. machinepools? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that's covered under 'and/or other groupings of machines,' although, machinepools don't necessarily group machines? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Machinepools is it's own type of "machine", it doesn't group individual machines. I thought it would be good to reference it to have it in mind in the design since some of the concepts used for machines below might not apply the same way. |
||||||
|
||||||
We will need input from the autoscaler team, as well as input from api reviewers to ensure we obtain the best solution. | ||||||
|
||||||
These ideas require a particular (possibly new) cluster-api cloud provider controller | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
which can inspect machinesets and annotate them as appropriate. This could be | ||||||
done dynamically via billing API (not recommended) or have values compiled in | ||||||
statically. | ||||||
|
||||||
We could also attempt to build this logic into the machineset/machinedeployment | ||||||
controller, but this would probably not be very scalable and would require | ||||||
a well defined set of fields on the provider CRD. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
### Goals | ||||||
|
||||||
### Non-Goals | ||||||
Comment on lines
+32
to
+34
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps this is only because I haven't been following prior conversations about this (sorry!), but I think I would be better able to understand the proposal if the motivation section included some goals and non-goals. For example, is it a goal to minimize (or not have any?) impact on any cluster API provider which does not support autoscale from zero? Is it a goal to minimize impact/knowledge burden for cluster API users who are not using this feature? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The goal is to support scale from zero in an automated way. We'll be relying on providers to provide the attributes (CPU, RAM, GPU) for their respective instance types. We also want to build a mechanism to support scale from zero on cloud providers that don't have common billing (such as an openstack cloud or some other abstraction) by providing a field (either spec or annotation on some object) that allows an administrator to define the attributes themselves. |
||||||
|
||||||
## Proposal | ||||||
|
||||||
### User Stories | ||||||
|
||||||
As a cluster admin, I want to add the ability to automatically scale my cluster | ||||||
using cluster autoscaler and the cluster-api. In order to save money, I need | ||||||
the ability to scale from zero for a given machineset or machinedeployment. | ||||||
|
||||||
## Proposal Ideas | ||||||
|
||||||
There are several proposals to consider. Once we decide on one, we'll eliminate | ||||||
the others. | ||||||
|
||||||
### Annotations | ||||||
Modify the autoscaler to look at particular annotations on a given object to determine the appropriate amounts of each resource. | ||||||
|
||||||
#### Benefits | ||||||
Easy, simple. Users can modify if needed (we can instruct controllers to only | ||||||
add annotations if they're missing, etc), users can bring their own clouds | ||||||
and annotate their objects to support scale from zero and we don't need to worry | ||||||
about supporting every cloud day 1. | ||||||
|
||||||
#### Cons | ||||||
We're essentially creating an API around an annotation, and the api folks don't | ||||||
like that type of thing. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we elaborate on the downside of doing that? Not sure "the api folks don't There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm think they don't like to have specific annotations be part of a defined api. We might be able to do just annotations if this was all for a single controller, but with an external controller looking at our object(s), it's unlikely to pass API review. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The main issues with annotations is that they are an informal api that is detached from the actual api, so enforcing forward/backward compatibility, migrations for support, etc are quite difficult. For example, the cluster-autoscaler currently uses the deprecated If the integration was done using an API field, then conversion webhooks would help ease the migration. |
||||||
|
||||||
### Spec Field | ||||||
We can add fields to the spec of a particular object to set these items. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I note the clever ambiguity in "particular object" - were you thinking of the MachineSet/MachineDeployment? We could also do a "sidecar" object, similar to how we did the
I'm not sure where we're currently putting minReplicas / maxReplicas? Also it's a little awkward because I would imagine I'd want to set these on the MachineDeployment level, but I'd imagine node-attributes should technically be on the MachineSet level. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One out of the box idea I had that seems related to this one was removing the integration between the cluster autoscaler and the cluster-api bits. We could come up with a new 'external scale group' object. The autoscaler increments or decrements that object, and something on the cluster-api watches that object and takes the appropriate actions. This would allow people to build their own automation after the recommendations of the autoscaler, but it's not entirely clear what this should look like. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I would expect this info to be accessible through any scalable resource liable to be autoscaled but decoupled from it. This is purely provider specific compute info which does not necessarily needs a replicas controller on top to be meaningful. In this line I'd see the current |
||||||
|
||||||
#### Benefits | ||||||
Well defined API. | ||||||
|
||||||
Users can set these fields, or some controller can | ||||||
populate them. | ||||||
|
||||||
#### Cons | ||||||
Automatically populating Spec fields is not ideal. For starters, a user might | ||||||
copy an existing machineset to create a new one with a different instance type, | ||||||
and might overlook these fields because they didn't set them. This would | ||||||
definitely cause problems. | ||||||
|
||||||
What do we do if the user specifies some fields and not others. | ||||||
|
||||||
Automatically populating spec seems to violate 'spec == user intent' principle. | ||||||
|
||||||
### Status Field | ||||||
We can add fields to the status of an object. | ||||||
|
||||||
#### Benefits | ||||||
Makes the most sense. If it's information we're determining programatically | ||||||
about an object that other things need to consume, it should probably go here. | ||||||
|
||||||
Well defined API. | ||||||
|
||||||
#### Cons | ||||||
Users can't easily update this field if they want to set values themselves for | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just that kubectl makes it hard to update status? Personally I'd rather tackle that (e.g. kubectl plugin) vs working around it with the override, if so! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it was just a matter of updating status, it'd be one thing. The other issue is that status is meant to be "recreatable" and if the resources are backed up/restored or migrated from one cluster to another the status will be lost. |
||||||
unsupported clouds or clouds where billing data is not common or available | ||||||
such as OpenStack. | ||||||
|
||||||
### Status Field + Override Mechanism | ||||||
Use the status field method as mentioned above, but add an override field to | ||||||
the Spec or support an annotation on the machineset/machinedeployment object | ||||||
to copy information into the status field. The autoscaler always reads from | ||||||
the status field, we're just making it easy to get data there. | ||||||
|
||||||
#### Benefits | ||||||
All the benefits of the Status Field option above. Users can set their own | ||||||
values easily without having to create special status client to do it (eg, | ||||||
can use kubectl or specify the values at object creation time). | ||||||
|
||||||
Since we're calling these new override fields (either in the spec or annoation) | ||||||
it should be clear what the intent of the values are. It solves the problem | ||||||
of user only specifies some fields and not others (eg, they only override CPU). | ||||||
|
||||||
Clean separation of discovered values and user intent. | ||||||
|
||||||
#### Cons | ||||||
We have to build something that copies these values into the staus. This seems | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
like it could be an easily shared piece of library code. | ||||||
|
||||||
Multiple controllers updating the same field on the status object, need to | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to avoid the potential for conflicting controllers attempting to manage the same field in a resource's Status field. We also need to keep in mind that the controller for the resource "owns" the status of that resource. Ideally we'd be able to copy these values from a resource that is owned by the controller that is generating them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's not really any place to copy the information from. Unless we create some sort of new 'scale record' or some crazy thing like that, and then that's yet-another-crd. |
||||||
ensure the logic is sound on both ends so there's not a reconcile war. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say this is actually a provider implementation choice on how they decide to implement the
TemplateNodeInfo
interface, e.g gce does it dynamically https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/gce/gce_cloud_provider.go#L332https://github.com/kubernetes/autoscaler/blob/589cd2af9c25088ce3408eeb88ba9239201d005d/cluster-autoscaler/cloudprovider/gce/gce_manager.go#L491-L504