Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Add cluster autoscaler scale from zero ux proposal #2530

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions docs/proposals/20200304-autoscaler-scale-ux.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
title: Cluster Autoscaler Scale from Zero UX
authors:
- "@michaelgugino"
reviewers:
- "@detiber"
creation-date: 2020-03-04
creation-date: 2020-03-04
status:
---

# Cluster Autoscaler Scale from Zero UX

## Summary
Cluster Autoscaler supports scale from zero for cloud-native providers. This is done via compiling CPU, GPU, and RAM info into the cluster autoscaler at build time for each supported cloud.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done via compiling CPU, GPU, and RAM info into the cluster autoscaler at build time for each supported cloud.

I'd say this is actually a provider implementation choice on how they decide to implement the TemplateNodeInfo interface, e.g gce does it dynamically https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/gce/gce_cloud_provider.go#L332
https://github.com/kubernetes/autoscaler/blob/589cd2af9c25088ce3408eeb88ba9239201d005d/cluster-autoscaler/cloudprovider/gce/gce_manager.go#L491-L504


We are attempting to add cluster-api support to the cluster-autoscaler to scale machinesets, machinedeployments, and/or other groupings of machines as appropriate. We are trying to keep an abstraction which is isolated from particular cloud implementations. We need a way to inform the cluster-autoscaler of the value of machine resources so it can make proper scheduling decisions when there are not currently any machines in a particular scale set (machineset/machinedeployment, etc).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

machinepools?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's covered under 'and/or other groupings of machines,' although, machinepools don't necessarily group machines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Machinepools is it's own type of "machine", it doesn't group individual machines. I thought it would be good to reference it to have it in mind in the design since some of the concepts used for machines below might not apply the same way.


We will need input from the autoscaler team, as well as input from api reviewers to ensure we obtain the best solution.

These ideas require a particular (possibly new) cluster-api cloud provider controller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These ideas require a particular (possibly new) cluster-api cloud provider controller
These ideas require a particular (possibly new) cluster-api infrastructure provider controller

which can inspect machinesets and annotate them as appropriate. This could be
done dynamically via billing API (not recommended) or have values compiled in
statically.

We could also attempt to build this logic into the machineset/machinedeployment
controller, but this would probably not be very scalable and would require
a well defined set of fields on the provider CRD.

## Motivation

### Goals

### Non-Goals
Comment on lines +32 to +34
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is only because I haven't been following prior conversations about this (sorry!), but I think I would be better able to understand the proposal if the motivation section included some goals and non-goals.

For example, is it a goal to minimize (or not have any?) impact on any cluster API provider which does not support autoscale from zero? Is it a goal to minimize impact/knowledge burden for cluster API users who are not using this feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to support scale from zero in an automated way. We'll be relying on providers to provide the attributes (CPU, RAM, GPU) for their respective instance types. We also want to build a mechanism to support scale from zero on cloud providers that don't have common billing (such as an openstack cloud or some other abstraction) by providing a field (either spec or annotation on some object) that allows an administrator to define the attributes themselves.


## Proposal

### User Stories

As a cluster admin, I want to add the ability to automatically scale my cluster
using cluster autoscaler and the cluster-api. In order to save money, I need
the ability to scale from zero for a given machineset or machinedeployment.

## Proposal Ideas

There are several proposals to consider. Once we decide on one, we'll eliminate
the others.

### Annotations
Modify the autoscaler to look at particular annotations on a given object to determine the appropriate amounts of each resource.

#### Benefits
Easy, simple. Users can modify if needed (we can instruct controllers to only
add annotations if they're missing, etc), users can bring their own clouds
and annotate their objects to support scale from zero and we don't need to worry
about supporting every cloud day 1.

#### Cons
We're essentially creating an API around an annotation, and the api folks don't
like that type of thing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we elaborate on the downside of doing that? Not sure "the api folks don't
like that type of thing" counts :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm think they don't like to have specific annotations be part of a defined api. We might be able to do just annotations if this was all for a single controller, but with an external controller looking at our object(s), it's unlikely to pass API review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issues with annotations is that they are an informal api that is detached from the actual api, so enforcing forward/backward compatibility, migrations for support, etc are quite difficult.

For example, the cluster-autoscaler currently uses the deprecated cluster.k8s.io/delete-machine annotation rather than cluster.x-k8s.io/delete-machine and in order to not break the integration we need to handle the backwards/forwards compatibility both in Cluster API and also in cluster-autoscaler: kubernetes/autoscaler#3161.

If the integration was done using an API field, then conversion webhooks would help ease the migration.


### Spec Field
We can add fields to the spec of a particular object to set these items.
Copy link
Contributor

@justinsb justinsb Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note the clever ambiguity in "particular object" - were you thinking of the MachineSet/MachineDeployment? We could also do a "sidecar" object, similar to how we did the infrastructureRef. We could then also put additional autoscaling controls here e.g.

kind: MachineDeploymentAutoscaling
spec:
  maxReplicas: 10
  minReplicas: 0
  scaleDownDelay: 10m
  scaleUpDelay: 0s
status:
  nodeShape:
     memory: 1Gi
     gpus: 1
     labels:
      foo: bar

I'm not sure where we're currently putting minReplicas / maxReplicas? Also it's a little awkward because I would imagine I'd want to set these on the MachineDeployment level, but I'd imagine node-attributes should technically be on the MachineSet level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One out of the box idea I had that seems related to this one was removing the integration between the cluster autoscaler and the cluster-api bits.

We could come up with a new 'external scale group' object. The autoscaler increments or decrements that object, and something on the cluster-api watches that object and takes the appropriate actions. This would allow people to build their own automation after the recommendations of the autoscaler, but it's not entirely clear what this should look like.

Copy link
Member

@enxebre enxebre Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minReplicas / maxReplicas are known annotations which apply to any scalable resource i.e machineSet/machineDeployment which makes them discoverable for the cluster autoscaler. They could be owned by e.g a machineAutoscaler resource:
#2369

cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size
cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size

apiVersion: "cluster.x-k8s.io"
kind: "MachineAutoscaler"
metadata:
  name: "worker-us-east-1a" 
  namespace: "cluster-api"
spec:
  minReplicas: 1 
  maxReplicas: 12 
  scaleTargetRef: 
    apiVersion: cluster.x-k8s.io
    kind: MachineSet 
    name: worker-us-east-1a

I would imagine I'd want to set these on the MachineDeployment level,

I would expect this info to be accessible through any scalable resource liable to be autoscaled but decoupled from it. This is purely provider specific compute info which does not necessarily needs a replicas controller on top to be meaningful. In this line I'd see the current providerMachineTemplates or something like a machineClass good candidates to infer this information as they see fit for each specific provider and expose it in a common way.


#### Benefits
Well defined API.

Users can set these fields, or some controller can
populate them.

#### Cons
Automatically populating Spec fields is not ideal. For starters, a user might
copy an existing machineset to create a new one with a different instance type,
and might overlook these fields because they didn't set them. This would
definitely cause problems.

What do we do if the user specifies some fields and not others.

Automatically populating spec seems to violate 'spec == user intent' principle.

### Status Field
We can add fields to the status of an object.

#### Benefits
Makes the most sense. If it's information we're determining programatically
about an object that other things need to consume, it should probably go here.

Well defined API.

#### Cons
Users can't easily update this field if they want to set values themselves for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just that kubectl makes it hard to update status? Personally I'd rather tackle that (e.g. kubectl plugin) vs working around it with the override, if so!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was just a matter of updating status, it'd be one thing. The other issue is that status is meant to be "recreatable" and if the resources are backed up/restored or migrated from one cluster to another the status will be lost.

unsupported clouds or clouds where billing data is not common or available
such as OpenStack.

### Status Field + Override Mechanism
Use the status field method as mentioned above, but add an override field to
the Spec or support an annotation on the machineset/machinedeployment object
to copy information into the status field. The autoscaler always reads from
the status field, we're just making it easy to get data there.

#### Benefits
All the benefits of the Status Field option above. Users can set their own
values easily without having to create special status client to do it (eg,
can use kubectl or specify the values at object creation time).

Since we're calling these new override fields (either in the spec or annoation)
it should be clear what the intent of the values are. It solves the problem
of user only specifies some fields and not others (eg, they only override CPU).

Clean separation of discovered values and user intent.

#### Cons
We have to build something that copies these values into the staus. This seems
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We have to build something that copies these values into the staus. This seems
We have to build something that copies these values into the status. This seems

like it could be an easily shared piece of library code.

Multiple controllers updating the same field on the status object, need to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid the potential for conflicting controllers attempting to manage the same field in a resource's Status field.

We also need to keep in mind that the controller for the resource "owns" the status of that resource. Ideally we'd be able to copy these values from a resource that is owned by the controller that is generating them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not really any place to copy the information from. Unless we create some sort of new 'scale record' or some crazy thing like that, and then that's yet-another-crd.

ensure the logic is sound on both ends so there's not a reconcile war.