Cheap and cheerful autoscaler #229

josephburnett · 2018-02-25T04:31:27Z

Add a Revision parameter for desired concurrency per process. New Autoscaler Deployment responsible for adjusting Ela Deployment pod count to achieve desired concurrency.

Each Ela Deployment pod connects to the Autoscaler and reports concurrency level every second. Every two seconds, the Autoscaler looks the observed concurrency and pod count and adjusts Deployment pod count accordingly.

Stable mode operates on a 60 second window. Panic mode operates on a 6 second window. Panic mode is engaged when 2x+ desired concurrency is observed. Panic mode disengages after 60 seconds of less than 2x concurrency.

mattmoor

Some drive-by comments while I set up my new cluster.

mattmoor · 2018-02-26T15:44:16Z

BUILD

@@ -13,6 +13,8 @@ k8s_object(
    name = "controller",
    images = {
        "ela-controller:latest": "//cmd/ela-controller:image",
+        "ela-queue:latest": "//pkg/sidecars/queue:image",
+        "ela-autoscaler:latest": "//pkg/autoscaler:image",


binaries (and by proxy images) should go under cmd/

mattmoor · 2018-02-26T15:47:46Z

pkg/apis/ela/v1alpha1/revision_types.go

+	Scaling *ScalingSpec `json:"scaling,omitempty"`
+}
+
+type ScalingSpec struct {


I think the API Draft had something different here. I believe we had a ConcurrencyModel enum, and no knobs (yet).

/cc @evankanderson

We really want Elafros to just figure out it's own autoscaling parameters, without input from the user. I think concurrency would be a good internal knob for fast autoscaling. And we can have a slower process evaluating the workload and cpu/memory/io metrics to tune the knob.

Right now I just want a way to set the concurrency knob directly so we can play around with autoscaling. We'll do the more complex model building and tuning later. That's all this parameter was meant to be.

But since this is also an API spec, maybe I had better not put it here. I'll just hard code target concurrency in the controller which is trivial to replace when testing autoscaling. Will remove this parameter.

Ripped this out.

mattmoor · 2018-02-26T15:49:30Z

pkg/autoscaler/autoscaler.go

+
+var upgrader = websocket.Upgrader{
+	ReadBufferSize:  1024,
+	WriteBufferSize: 1024,


How were these values arrived at? Are they bits, bytes, gigabytes?

These values were carefully arrived at via the copy-paste method: https://github.com/gorilla/websocket/blob/58729a2165ebb9f1d023226076f660139c2e2e0c/examples/chat/client.go#L35

But it looks like I can just provide defaults. I'll remove the buffer sizes since I don't think it matters in this case.

mattmoor · 2018-02-26T15:50:40Z

pkg/autoscaler/autoscaler.go

+var statChan = make(chan types.Stat, 100)
+
+func autoscaler() {
+	targetConcurrency := float64(10)


Can you hoist 10 here, 100 above, and the 1024 constants into a const ( ... ) block with detailed comments describing their purpose and how they were arrived at?

mattmoor · 2018-02-26T15:50:59Z

pkg/autoscaler/autoscaler.go

+	if targetConcurrencyParam != "" {
+		concurrency, err := strconv.Atoi(targetConcurrencyParam)
+		if err != nil {
+			panic(err)


Prefer: glog.Fatalf I think

mattmoor · 2018-02-26T16:13:55Z

pkg/autoscaler/types/types.go

+type Stat struct {
+	PodName            string
+	RequestCount       int32
+	ConcurrentRequests int32


mattmoor · 2018-02-26T16:16:11Z

pkg/autoscaler/lib/lib.go

+// concurrency reaches 2x the target stable concurrency. Panic mode will
+// last at least 60 seconds--longer if the 2x threshold is repeatedly
+// breached. During panic mode the number of pods is never decreased in
+// order to prevent flapping.


Perhaps we should put together pkg/autoscaler/README.md outlining this in more detail?

This comment block seems like a good candidate for a package declaration comment (https://blog.golang.org/godoc-documenting-go-code). This is often placed in a separate doc.go file.

mattmoor · 2018-02-26T16:16:40Z

pkg/autoscaler/lib/lib.go

+package lib
+
+import (
+	"log"


we use glog most places.

mattmoor · 2018-02-26T16:18:37Z

pkg/autoscaler/autoscaler.go

+	"strconv"
+	"time"
+
+	"github.com/google/elafros/pkg/autoscaler/lib"


I think that once you split main into cmd/ that this being under pkg/ will convey that this is a library, and we should just collapse this. WDYT?

mattmoor · 2018-02-26T16:20:39Z

clusterrolebinding.yaml

+    namespace: default
+roleRef:
+  kind: ClusterRole
+  name: cluster-admin  # TODO(josephburnett): reduce this role to read-only


How deeply do we understand the capabilities the autoscaler needs right now? Can we just do this TODO?

In the medium term, we want to collect metrics from Prometheus, in which case we can do away with this queue->autoscaler websocket pipeline and associated permissions. In the short term, we should turn the client-server relationship around and have the autoscaler scrape the pods (@evankanderson and @vaikas-google 's suggestion) which would also do away with the pod permission requirement. This is just to play around with and I plan to get rid of it. Will update to comment accordingly.

In the medium term, we want to collect metrics from Prometheus

If the autoscaler collects metrics from Prometheus instead of pods directly, its reaction time is coupled to Prometheus' sampling interval. It would be more flexible to have the autoscaler scrape pods directly (using their Prometheus endpoints). Then it can decide its own sampling interval. Here are some examples of situations when the autoscaler might want to vary sampling frequency:

Watch pod creation events and sample new revisions more frequently

Sample highly scaled revisions slower on the assumption that they're less likely to need fast reactions and are more expensive to sample

Increase sampling frequency of revisions that were recently scaled

If the autoscaler does its own scraping, it still needs a ClusterRoleBinding with read permissions so it can enumerate the list of pods to target.

It's possible that Prometheus has an API the autoscaler can use to increase or decrease the sampling frequency of a particular tagged metric. That might be sufficient and we could avoid writing a bunch of scraping code.

its reaction time is coupled to Prometheus' sampling interval.

Agree. Maybe we will stick with scraping the pods if we can't get the Envroy->Mixer->Prometheus pipeline latency low enough.

Yes, the autoscaler will still need a role to find the pods. And to modify the deployement. The queue is also using this role binding and that should go away.

…mandline parameter.

…ler.

…deployment.

grantr · 2018-02-26T20:14:18Z

Detailed review coming, but in general I wonder if the queue can be replaced by envoy stats collection. Since we're likely to run envoy in every pod anyway, that eliminates a sidecar from each pod and gives us the benefit of existing and future investment in envoy's stats infrastructure.

Envoy can export metrics in statsd format. If we set it up to push to a Kubernetes Service in front of the autoscaler deployment, then the autoscaler could make scaling decisions based on any Envoy metric or in fact any statsd metric source (if we later replace Envoy or allow scaling on custom metrics).

Envoy can also export in Prometheus format for pull metrics. Given that we're currently shipping Prometheus as the default metrics provider (see #189) it seems reasonable to architect the autoscaler as a scraper, using the Kubernetes API to discover Elafros pods and collecting metrics from their Envoy instances. I like the pull architecture because it gives the autoscaler control of sampling frequency and load shedding. Like statsd, the Prometheus format is also a standard so a future Envoy replacement is likely to support it. Custom metrics could be supported by a field in the Configuration specifying a URL path.

If we decide to keep the queue and custom stats format, lets at least use Protobuf for the wire format so we have better versioning semantics. I'd also recommend gRPC over websockets.

grantr · 2018-02-28T21:09:14Z

pkg/controller/revision/ela_pod.go

@@ -26,6 +28,14 @@ import (
 	meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 )

+const (
+	// Each Elafros pod gets 1 cpu.


I know this isn't necessarily your decision, but it seems odd to hardcode the resource requests (and limits) and disallow users specifying their own. The way it's implemented here, users can't specify memory requests or memory limits, making elafros pods inherently unsafe. The previous limit of 0.025 cpus and the current limit of 0.85 cpus both seem arbitrary.

grantr · 2018-02-28T21:47:18Z

pkg/controller/revision/ela_nginx_constants.go

@@ -50,11 +50,7 @@ http {
  # to avoid a race condition between the two timeouts.
  keepalive_timeout 650;
  keepalive_requests 10000;
-{{if .EnableQueue}}


If you do end up removing this knob, also remove the corresponding field at
https://github.com/google/elafros/blob/b5009b04dd329a7e5ef723ca4f8e7de01ea5d525/pkg/controller/revision/ela_nginx_constants.go#L24

grantr

LGTM with a few remaining comment/copyright header nits, but I'm curious what @mattmoor thinks.

grantr · 2018-02-28T22:03:24Z

pkg/autoscaler/autoscaler.go

@@ -0,0 +1,153 @@
+package autoscaler


Needs a copyright header

Done. I don't know how I missed that one!

grantr · 2018-02-28T22:03:50Z

pkg/autoscaler/autoscaler_test.go

@@ -0,0 +1,255 @@
+package autoscaler


Needs copyright header

grantr · 2018-02-28T22:13:38Z

pkg/autoscaler/doc.go

+// concurrency reaches 2x the target stable concurrency. Panic mode will
+// last at least 60 seconds--longer if the 2x threshold is repeatedly
+// breached. During panic mode the number of pods is never decreased in
+// order to prevent flapping.


This comment should go above the package declaration, which should be lowercase. See an example at https://golang.org/src/encoding/gob/doc.go.

/* Autoscaler calculates ... [snip] order to prevent flapping. */ package autoscaler

Whoops! Stupid error. Fixed.

grantr · 2018-02-28T22:33:54Z

pkg/controller/revision/controller.go


 var (
-	elaPodReplicaCount       = int32(2)
+	elaPodReplicaCount       = int32(1)


I presume this changed because we now have autoscaling 😁

Yes. :)

The autoscale immediately recognized that we don't need 2 pods to serve 0 concurrent requests. So creating two is pointless.

grantr · 2018-02-28T22:50:56Z

pkg/autoscaler/doc.go

@@ -1,18 +1,19 @@
-package Autoscaler
+/*


Oops, I think this one also needs a copyright header

(When we have automated tests, we should run a linter that makes sure every go source file has a copyright header)

Adding a simple autoscaler to collect metrics directly from the pods and scale based on concurrent requests. More in //pkg/autoscaler/doc.go. * Bring back the queue. * Wire queue between nginx and app. * Autoscaler and queue that share a stat type. * Initialize queue with autoscaler service before starting stat reporter. * Connect stat sink. * Add gorilla websocket to deps. * Build the queue with bazel and pass diget into controller through commandline parameter. * Setup env variables and service account for queue to find the autoscaler. * Create autoscaler service and deployment and connect queue. * Reconnect to autoscaler and send pod name. * Calculate 6 and 60 second QPS and scaling action. * Replace 6 and 60 with parameters. * Do actual scaling. Tune parameters. * Scale deployment in the background. * Request a full CPU for each ela pod. * Calculate QPS with floats. * Scale on concurrent requests instead of QPS. * Provide desired concurrency per process in revision spec. * Add test for queue-proxy. Fails becuase of extra autoscaler deployment. * Fix unit tests by checking ela deployment separately from autoscaler deployment. * Add autoscaler deployment env variable test. * Move core autoscaler logic into lib for unit testing. * Refactoring autoscaler for unit testing. * Autoscaler unit tests. * Autoscaler comments. * Only accept target concurrency of 1+. * Limit scale up ratio to 10x. * Fix git rebase mistakes. * Move autoscaler main to cmd. * Move queue sidecar to cmd. * Remove TargetConcurrencyPerProcess revision parameter. * Replace log with glog. * Use defaults for websocket upgrader. * Add service account and binding for autoscaler. * Update deps. * Fix incorrect usage of glog. * Const parameters. * Fix targetConcurrency typo. * Fix typo. * Inject autoscaler name to remove hardcoded value. * Pull out queue parameters into constants. * Add liscense headers to queue and autoscaler. * Cpu requests in constants. * Plumb autoscaler port through env from single constant. * Comment for autoscaler types. * Fix log statement formatting. * Add back ela-revision service account. * Report time from pod with concurrency stat. * Send only one scale request at a time with a 5 second timeout. * Move autoscaler docs to package documentation. * Include pod name in stat key. * Comment about waiting for autoscaler IP. * Add queue->autoscaler connect sleep comment. * Parse losthost url once. * Use singleton proxy in queue. * Fold autoscaler/types package into autoscaler. * Add Record and Scale function comments. * Move environment variable access into init. * Remove enableQueue nginx template parameter. * Comments and copyright headers. * Copyright header.

Test operator TP1-02 via the 0.7.1 CI.

josephburnett requested a review from vaikas February 25, 2018 04:31

mattmoor reviewed Feb 26, 2018

View reviewed changes

josephburnett requested a review from grantr February 26, 2018 19:01

josephburnett added 26 commits February 26, 2018 14:36

Bring back the queue.

520284f

Wire queue between nginx and app.

0ef9a10

Autoscaler and queue that share a stat type.

6373b51

Initialize queue with autoscaler service before starting stat reporter.

09ca18e

Connect stat sink.

aef729a

Add gorilla websocket to deps.

ea33034

Build the queue with bazel and pass diget into controller through com…

847c907

…mandline parameter.

Setup env variables and service account for queue to find the autosca…

cf94ccb

…ler.

Create autoscaler service and deployment and connect queue.

5cc75a3

Reconnect to autoscaler and send pod name.

3025926

Calculate 6 and 60 second QPS and scaling action.

8af0392

Replace 6 and 60 with parameters.

0eca4d4

Do actual scaling. Tune parameters.

aaf102c

Scale deployment in the background.

eea535b

Request a full CPU for each ela pod.

fec0ce1

Calculate QPS with floats.

1861bc7

Scale on concurrent requests instead of QPS.

f2ff89f

Provide desired concurrency per process in revision spec.

2e736bb

Add test for queue-proxy. Fails becuase of extra autoscaler deployment.

8f776d5

Fix unit tests by checking ela deployment separately from autoscaler …

81ed376

…deployment.

Add autoscaler deployment env variable test.

98beef3

Move core autoscaler logic into lib for unit testing.

ef41614

Refactoring autoscaler for unit testing.

45861ae

Autoscaler unit tests.

b8eb904

Autoscaler comments.

81091e8

Only accept target concurrency of 1+.

812585d

josephburnett added 7 commits February 28, 2018 14:12

Plumb autoscaler port through env from single constant.

3cfa284

Comment for autoscaler types.

a4c01e1

Fix log statement formatting.

e72087b

Add back ela-revision service account.

0b75d97

Report time from pod with concurrency stat.

5adb66d

Send only one scale request at a time with a 5 second timeout.

9ecea8d

Move autoscaler docs to package documentation.

13dbf7c

josephburnett force-pushed the caca branch from 9c222b4 to 13dbf7c Compare February 28, 2018 19:12

josephburnett added 9 commits February 28, 2018 14:36

Include pod name in stat key.

829bfe8

Comment about waiting for autoscaler IP.

b1c8f1a

Add queue->autoscaler connect sleep comment.

3567398

Parse losthost url once.

51eeea1

Use singleton proxy in queue.

28f1f1b

Fold autoscaler/types package into autoscaler.

e01bf7e

Add Record and Scale function comments.

a5f8b7d

Move environment variable access into init.

dd47fa0

Merge branch 'master' into caca

0912ee4

grantr reviewed Feb 28, 2018

View reviewed changes

Remove enableQueue nginx template parameter.

65d1c4f

grantr approved these changes Feb 28, 2018

View reviewed changes

Comments and copyright headers.

aa9de12

grantr reviewed Feb 28, 2018

View reviewed changes

Copyright header.

0b22485

josephburnett merged commit bf78448 into knative:master Feb 28, 2018

This was referenced Feb 28, 2018

Fix autoscaler tests #265

Merged

Cluster doesn't appear to scale down instances (not pods) #164

Closed

This was referenced Mar 9, 2018

Increase default node cpu size from 1 to 4. #335

Merged

Autoscaler changes broke deployment in non-default namespace #291

Closed

nak3 pushed a commit to nak3/serving that referenced this pull request Sep 6, 2019

Merge pull request knative#229 from openshift/tp1-operator-02

551d4cd

Test operator TP1-02 via the 0.7.1 CI.

Cheap and cheerful autoscaler #229

Cheap and cheerful autoscaler #229

Conversation

josephburnett commented Feb 25, 2018

mattmoor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grantr commented Feb 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grantr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment