-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cheap and cheerful autoscaler #229
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some drive-by comments while I set up my new cluster.
BUILD
Outdated
@@ -13,6 +13,8 @@ k8s_object( | |||
name = "controller", | |||
images = { | |||
"ela-controller:latest": "//cmd/ela-controller:image", | |||
"ela-queue:latest": "//pkg/sidecars/queue:image", | |||
"ela-autoscaler:latest": "//pkg/autoscaler:image", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binaries (and by proxy images) should go under cmd/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Scaling *ScalingSpec `json:"scaling,omitempty"` | ||
} | ||
|
||
type ScalingSpec struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API Draft had something different here. I believe we had a ConcurrencyModel
enum, and no knobs (yet).
/cc @evankanderson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really want Elafros to just figure out it's own autoscaling parameters, without input from the user. I think concurrency would be a good internal knob for fast autoscaling. And we can have a slower process evaluating the workload and cpu/memory/io metrics to tune the knob.
Right now I just want a way to set the concurrency knob directly so we can play around with autoscaling. We'll do the more complex model building and tuning later. That's all this parameter was meant to be.
But since this is also an API spec, maybe I had better not put it here. I'll just hard code target concurrency in the controller which is trivial to replace when testing autoscaling. Will remove this parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ripped this out.
pkg/autoscaler/autoscaler.go
Outdated
|
||
var upgrader = websocket.Upgrader{ | ||
ReadBufferSize: 1024, | ||
WriteBufferSize: 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How were these values arrived at? Are they bits, bytes, gigabytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values were carefully arrived at via the copy-paste method: https://github.com/gorilla/websocket/blob/58729a2165ebb9f1d023226076f660139c2e2e0c/examples/chat/client.go#L35
But it looks like I can just provide defaults. I'll remove the buffer sizes since I don't think it matters in this case.
pkg/autoscaler/autoscaler.go
Outdated
var statChan = make(chan types.Stat, 100) | ||
|
||
func autoscaler() { | ||
targetConcurrency := float64(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you hoist 10
here, 100
above, and the 1024
constants into a const ( ... )
block with detailed comments describing their purpose and how they were arrived at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/autoscaler/autoscaler.go
Outdated
if targetConcurrencyParam != "" { | ||
concurrency, err := strconv.Atoi(targetConcurrencyParam) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer: glog.Fatalf
I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/autoscaler/types/types.go
Outdated
type Stat struct { | ||
PodName string | ||
RequestCount int32 | ||
ConcurrentRequests int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments?
pkg/autoscaler/lib/lib.go
Outdated
// concurrency reaches 2x the target stable concurrency. Panic mode will | ||
// last at least 60 seconds--longer if the 2x threshold is repeatedly | ||
// breached. During panic mode the number of pods is never decreased in | ||
// order to prevent flapping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should put together pkg/autoscaler/README.md
outlining this in more detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment block seems like a good candidate for a package declaration comment (https://blog.golang.org/godoc-documenting-go-code). This is often placed in a separate doc.go
file.
pkg/autoscaler/lib/lib.go
Outdated
package lib | ||
|
||
import ( | ||
"log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use glog most places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/autoscaler/autoscaler.go
Outdated
"strconv" | ||
"time" | ||
|
||
"github.com/google/elafros/pkg/autoscaler/lib" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that once you split main
into cmd/
that this being under pkg/
will convey that this is a library, and we should just collapse this. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
namespace: default | ||
roleRef: | ||
kind: ClusterRole | ||
name: cluster-admin # TODO(josephburnett): reduce this role to read-only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How deeply do we understand the capabilities the autoscaler needs right now? Can we just do this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the medium term, we want to collect metrics from Prometheus, in which case we can do away with this queue->autoscaler websocket pipeline and associated permissions. In the short term, we should turn the client-server relationship around and have the autoscaler scrape the pods (@evankanderson and @vaikas-google 's suggestion) which would also do away with the pod permission requirement. This is just to play around with and I plan to get rid of it. Will update to comment accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the medium term, we want to collect metrics from Prometheus
If the autoscaler collects metrics from Prometheus instead of pods directly, its reaction time is coupled to Prometheus' sampling interval. It would be more flexible to have the autoscaler scrape pods directly (using their Prometheus endpoints). Then it can decide its own sampling interval. Here are some examples of situations when the autoscaler might want to vary sampling frequency:
- Watch pod creation events and sample new revisions more frequently
- Sample highly scaled revisions slower on the assumption that they're less likely to need fast reactions and are more expensive to sample
- Increase sampling frequency of revisions that were recently scaled
If the autoscaler does its own scraping, it still needs a ClusterRoleBinding with read permissions so it can enumerate the list of pods to target.
It's possible that Prometheus has an API the autoscaler can use to increase or decrease the sampling frequency of a particular tagged metric. That might be sufficient and we could avoid writing a bunch of scraping code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its reaction time is coupled to Prometheus' sampling interval.
Agree. Maybe we will stick with scraping the pods if we can't get the Envroy->Mixer->Prometheus pipeline latency low enough.
Yes, the autoscaler will still need a role to find the pods. And to modify the deployement. The queue is also using this role binding and that should go away.
…mandline parameter.
Detailed review coming, but in general I wonder if the queue can be replaced by envoy stats collection. Since we're likely to run envoy in every pod anyway, that eliminates a sidecar from each pod and gives us the benefit of existing and future investment in envoy's stats infrastructure. Envoy can export metrics in statsd format. If we set it up to push to a Kubernetes Service in front of the autoscaler deployment, then the autoscaler could make scaling decisions based on any Envoy metric or in fact any statsd metric source (if we later replace Envoy or allow scaling on custom metrics). Envoy can also export in Prometheus format for pull metrics. Given that we're currently shipping Prometheus as the default metrics provider (see #189) it seems reasonable to architect the autoscaler as a scraper, using the Kubernetes API to discover Elafros pods and collecting metrics from their Envoy instances. I like the pull architecture because it gives the autoscaler control of sampling frequency and load shedding. Like statsd, the Prometheus format is also a standard so a future Envoy replacement is likely to support it. Custom metrics could be supported by a field in the Configuration specifying a URL path. If we decide to keep the queue and custom stats format, lets at least use Protobuf for the wire format so we have better versioning semantics. I'd also recommend gRPC over websockets. |
@@ -26,6 +28,14 @@ import ( | |||
meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1" | |||
) | |||
|
|||
const ( | |||
// Each Elafros pod gets 1 cpu. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't necessarily your decision, but it seems odd to hardcode the resource requests (and limits) and disallow users specifying their own. The way it's implemented here, users can't specify memory requests or memory limits, making elafros pods inherently unsafe. The previous limit of 0.025 cpus and the current limit of 0.85 cpus both seem arbitrary.
@@ -50,11 +50,7 @@ http { | |||
# to avoid a race condition between the two timeouts. | |||
keepalive_timeout 650; | |||
keepalive_requests 10000; | |||
{{if .EnableQueue}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do end up removing this knob, also remove the corresponding field at
https://github.com/google/elafros/blob/b5009b04dd329a7e5ef723ca4f8e7de01ea5d525/pkg/controller/revision/ela_nginx_constants.go#L24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few remaining comment/copyright header nits, but I'm curious what @mattmoor thinks.
@@ -0,0 +1,153 @@ | |||
package autoscaler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a copyright header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I don't know how I missed that one!
@@ -0,0 +1,255 @@ | |||
package autoscaler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs copyright header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/autoscaler/doc.go
Outdated
// concurrency reaches 2x the target stable concurrency. Panic mode will | ||
// last at least 60 seconds--longer if the 2x threshold is repeatedly | ||
// breached. During panic mode the number of pods is never decreased in | ||
// order to prevent flapping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should go above the package declaration, which should be lowercase. See an example at https://golang.org/src/encoding/gob/doc.go.
/*
Autoscaler calculates ...
[snip]
order to prevent flapping.
*/
package autoscaler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops! Stupid error. Fixed.
|
||
var ( | ||
elaPodReplicaCount = int32(2) | ||
elaPodReplicaCount = int32(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this changed because we now have autoscaling 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. :)
The autoscale immediately recognized that we don't need 2 pods to serve 0 concurrent requests. So creating two is pointless.
pkg/autoscaler/doc.go
Outdated
@@ -1,18 +1,19 @@ | |||
package Autoscaler | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I think this one also needs a copyright header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(When we have automated tests, we should run a linter that makes sure every go source file has a copyright header)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gah! Done.
Adding a simple autoscaler to collect metrics directly from the pods and scale based on concurrent requests. More in //pkg/autoscaler/doc.go. * Bring back the queue. * Wire queue between nginx and app. * Autoscaler and queue that share a stat type. * Initialize queue with autoscaler service before starting stat reporter. * Connect stat sink. * Add gorilla websocket to deps. * Build the queue with bazel and pass diget into controller through commandline parameter. * Setup env variables and service account for queue to find the autoscaler. * Create autoscaler service and deployment and connect queue. * Reconnect to autoscaler and send pod name. * Calculate 6 and 60 second QPS and scaling action. * Replace 6 and 60 with parameters. * Do actual scaling. Tune parameters. * Scale deployment in the background. * Request a full CPU for each ela pod. * Calculate QPS with floats. * Scale on concurrent requests instead of QPS. * Provide desired concurrency per process in revision spec. * Add test for queue-proxy. Fails becuase of extra autoscaler deployment. * Fix unit tests by checking ela deployment separately from autoscaler deployment. * Add autoscaler deployment env variable test. * Move core autoscaler logic into lib for unit testing. * Refactoring autoscaler for unit testing. * Autoscaler unit tests. * Autoscaler comments. * Only accept target concurrency of 1+. * Limit scale up ratio to 10x. * Fix git rebase mistakes. * Move autoscaler main to cmd. * Move queue sidecar to cmd. * Remove TargetConcurrencyPerProcess revision parameter. * Replace log with glog. * Use defaults for websocket upgrader. * Add service account and binding for autoscaler. * Update deps. * Fix incorrect usage of glog. * Const parameters. * Fix targetConcurrency typo. * Fix typo. * Inject autoscaler name to remove hardcoded value. * Pull out queue parameters into constants. * Add liscense headers to queue and autoscaler. * Cpu requests in constants. * Plumb autoscaler port through env from single constant. * Comment for autoscaler types. * Fix log statement formatting. * Add back ela-revision service account. * Report time from pod with concurrency stat. * Send only one scale request at a time with a 5 second timeout. * Move autoscaler docs to package documentation. * Include pod name in stat key. * Comment about waiting for autoscaler IP. * Add queue->autoscaler connect sleep comment. * Parse losthost url once. * Use singleton proxy in queue. * Fold autoscaler/types package into autoscaler. * Add Record and Scale function comments. * Move environment variable access into init. * Remove enableQueue nginx template parameter. * Comments and copyright headers. * Copyright header.
Test operator TP1-02 via the 0.7.1 CI.
Add a Revision parameter for desired concurrency per process. New Autoscaler Deployment responsible for adjusting Ela Deployment pod count to achieve desired concurrency.
Each Ela Deployment pod connects to the Autoscaler and reports concurrency level every second. Every two seconds, the Autoscaler looks the observed concurrency and pod count and adjusts Deployment pod count accordingly.
Stable mode operates on a 60 second window. Panic mode operates on a 6 second window. Panic mode is engaged when 2x+ desired concurrency is observed. Panic mode disengages after 60 seconds of less than 2x concurrency.