Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA expander plugin proposal #4134

Merged
merged 2 commits into from
Jul 10, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions cluster-autoscaler/proposals/expander-plugin-grpc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Expander Plugin over gRPC

Authors:
Evan Sheng ([email protected]) @evansheng

## Context and Rationale

Cluster Autoscaler makes node scaling decisions based on Expander strategies, to select which node group to add nodes to.
Currently, there are 5 expanders: random, most-pods, least-waste, price, and priority. These provide a variety of behaviors in which CA behaves to expand a cluster.

However, whenever users would like to implement a customer expander they must either submit a change upstream, or fork CA and maintain their own implementation.
Iteration time would be slow, and oftentimes, users may have a case where they want to implement custom business logic into the expander that may not be relevant to the wider community, but do not want to maintain a fork.

Namely, we (Airbnb) would like to iterate on custom expander strategies (such as a weighted random expander, custom spot market strategy, etc) out of band with CA.
To do this, we propose a solution that adds a new expander to CA, but does not break backwards compatibility.


## Proposal

We will extend CA to utilize a pluggable external expander. The design for this expander plugin is heavily based off of this [proposal](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/plugable-provider-grpc.md) to CA, for a pluggable cloud provider interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would depersonalize this a little, perhaps something like this:

Suggested change
We will extend CA to utilize a pluggable external expander. The design for this expander plugin is heavily based off of this [proposal](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/plugable-provider-grpc.md) to CA, for a pluggable cloud provider interface.
Extend the cluster autoscaler to utilize a pluggable external expander. The design for this expander plugin is heavily based off of this [proposal](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/plugable-provider-grpc.md) to CA, for a pluggable cloud provider interface.


The solution will include a server acting as an external expander alongside CA, and communicate via gRPC with TLS. This expander will run in another pod, as a separate service, deployed independently of CA.
This is depicted below.

![Figure](./images/external-expander-plugin-grpc.jpg)

The gRPC server must implement the API of the [expander strategy interface](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/expander/expander.go#L50) in CA, which only has one method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be really cool to see a small example of the main function for an expander server


```go
// Strategy describes an interface for selecting the best option when scaling up
type Strategy interface {
BestOption(options []Option, nodeInfo map[string]*schedulerframework.NodeInfo) *Option
}
```

The gRPC service, running as an external pod, is defined below:

```protobuf
syntax = "proto3";

package clusterautoscaler.expander.v1;

import "google/protobuf";
import "k8s.io/autoscaler/cluster-autoscaler/cloudprovider/generated.proto";
import "k8s.io/api/core/v1/generated.proto";

option go_package = "v1";

service Expander {
rpc BestOption (BestOptionRequest)
returns (BestOptionResponse) {}
}
```
The messages used in these calls are also defined:
```protobuf
message BestOptionRequest {
repeated Option options = 1;
map<string, schedulerframework.NodeInfo> nodeInfo = 2;
}

message BestOptionResponse {
Option option = 1;
}

message Option {
k8s.io.autoscaler.cluster-autoscaler.cloudprovider.NodeGroup nodeGroup = 1;
int32 nodeCount = 2;
string debug = 3;
repeated k8s.io.api.core.v1.Pod pod = 4;
}
```

If errors arise with the gRPC server, the client will handle these by falling back to the random expander strategy, similarly to the fallbacks of the current expanders.

To communicate with the external gRPC server, CA needs new flags to expose details about the server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we planning on handling errors? Just fallback to the random expander on failures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, was planning on handling similarly to the other current expanders, and falling back to the random strategy
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/expander/priority/priority.go#L59


We’ll add a new option to the expander flag: `--expander=externalgrpc`, and inntroduce a new flag `--expander-plugin-url=https://external-grpc-url/server` to reach the gRPC server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would depersonalize here as well, and also a small spelling nit

Suggested change
We’ll add a new option to the expander flag: `--expander=externalgrpc`, and inntroduce a new flag `--expander-plugin-url=https://external-grpc-url/server` to reach the gRPC server.
Add a new option to the expander flag: `--expander=externalgrpc`, and introduce a new flag `--expander-plugin-url=https://external-grpc-url/server` to reach the gRPC server.


Additionally, we’ll need to use TLS for secure communication. An additional flag `--external-expander-cert=~/path/to/cert` will be exposed to specify the path to the certificate authority bundle used to validate the TLS cert used by the server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would clean this up a little too

Suggested change
Additionally, we’ll need to use TLS for secure communication. An additional flag `--external-expander-cert=~/path/to/cert` will be exposed to specify the path to the certificate authority bundle used to validate the TLS cert used by the server.
When needing to use TLS for secure communications, an additional flag `--external-expander-cert=~/path/to/cert` will be exposed to specify the path to the certificate authority bundle used to validate the TLS cert used by the server.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.