Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cheap and cheerful autoscaler #229

Merged
merged 62 commits into from
Feb 28, 2018
Merged

Conversation

josephburnett
Copy link
Contributor

Add a Revision parameter for desired concurrency per process. New Autoscaler Deployment responsible for adjusting Ela Deployment pod count to achieve desired concurrency.

Each Ela Deployment pod connects to the Autoscaler and reports concurrency level every second. Every two seconds, the Autoscaler looks the observed concurrency and pod count and adjusts Deployment pod count accordingly.

Stable mode operates on a 60 second window. Panic mode operates on a 6 second window. Panic mode is engaged when 2x+ desired concurrency is observed. Panic mode disengages after 60 seconds of less than 2x concurrency.

@josephburnett josephburnett requested a review from vaikas February 25, 2018 04:31
Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some drive-by comments while I set up my new cluster.

BUILD Outdated
@@ -13,6 +13,8 @@ k8s_object(
name = "controller",
images = {
"ela-controller:latest": "//cmd/ela-controller:image",
"ela-queue:latest": "//pkg/sidecars/queue:image",
"ela-autoscaler:latest": "//pkg/autoscaler:image",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binaries (and by proxy images) should go under cmd/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Scaling *ScalingSpec `json:"scaling,omitempty"`
}

type ScalingSpec struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API Draft had something different here. I believe we had a ConcurrencyModel enum, and no knobs (yet).

/cc @evankanderson

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really want Elafros to just figure out it's own autoscaling parameters, without input from the user. I think concurrency would be a good internal knob for fast autoscaling. And we can have a slower process evaluating the workload and cpu/memory/io metrics to tune the knob.

Right now I just want a way to set the concurrency knob directly so we can play around with autoscaling. We'll do the more complex model building and tuning later. That's all this parameter was meant to be.

But since this is also an API spec, maybe I had better not put it here. I'll just hard code target concurrency in the controller which is trivial to replace when testing autoscaling. Will remove this parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ripped this out.


var upgrader = websocket.Upgrader{
ReadBufferSize: 1024,
WriteBufferSize: 1024,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How were these values arrived at? Are they bits, bytes, gigabytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values were carefully arrived at via the copy-paste method: https://github.com/gorilla/websocket/blob/58729a2165ebb9f1d023226076f660139c2e2e0c/examples/chat/client.go#L35

But it looks like I can just provide defaults. I'll remove the buffer sizes since I don't think it matters in this case.

var statChan = make(chan types.Stat, 100)

func autoscaler() {
targetConcurrency := float64(10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you hoist 10 here, 100 above, and the 1024 constants into a const ( ... ) block with detailed comments describing their purpose and how they were arrived at?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if targetConcurrencyParam != "" {
concurrency, err := strconv.Atoi(targetConcurrencyParam)
if err != nil {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer: glog.Fatalf I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

type Stat struct {
PodName string
RequestCount int32
ConcurrentRequests int32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments?

// concurrency reaches 2x the target stable concurrency. Panic mode will
// last at least 60 seconds--longer if the 2x threshold is repeatedly
// breached. During panic mode the number of pods is never decreased in
// order to prevent flapping.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should put together pkg/autoscaler/README.md outlining this in more detail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment block seems like a good candidate for a package declaration comment (https://blog.golang.org/godoc-documenting-go-code). This is often placed in a separate doc.go file.

package lib

import (
"log"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use glog most places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"strconv"
"time"

"github.com/google/elafros/pkg/autoscaler/lib"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that once you split main into cmd/ that this being under pkg/ will convey that this is a library, and we should just collapse this. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

namespace: default
roleRef:
kind: ClusterRole
name: cluster-admin # TODO(josephburnett): reduce this role to read-only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How deeply do we understand the capabilities the autoscaler needs right now? Can we just do this TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the medium term, we want to collect metrics from Prometheus, in which case we can do away with this queue->autoscaler websocket pipeline and associated permissions. In the short term, we should turn the client-server relationship around and have the autoscaler scrape the pods (@evankanderson and @vaikas-google 's suggestion) which would also do away with the pod permission requirement. This is just to play around with and I plan to get rid of it. Will update to comment accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the medium term, we want to collect metrics from Prometheus

If the autoscaler collects metrics from Prometheus instead of pods directly, its reaction time is coupled to Prometheus' sampling interval. It would be more flexible to have the autoscaler scrape pods directly (using their Prometheus endpoints). Then it can decide its own sampling interval. Here are some examples of situations when the autoscaler might want to vary sampling frequency:

  • Watch pod creation events and sample new revisions more frequently
  • Sample highly scaled revisions slower on the assumption that they're less likely to need fast reactions and are more expensive to sample
  • Increase sampling frequency of revisions that were recently scaled

If the autoscaler does its own scraping, it still needs a ClusterRoleBinding with read permissions so it can enumerate the list of pods to target.

It's possible that Prometheus has an API the autoscaler can use to increase or decrease the sampling frequency of a particular tagged metric. That might be sufficient and we could avoid writing a bunch of scraping code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its reaction time is coupled to Prometheus' sampling interval.

Agree. Maybe we will stick with scraping the pods if we can't get the Envroy->Mixer->Prometheus pipeline latency low enough.

Yes, the autoscaler will still need a role to find the pods. And to modify the deployement. The queue is also using this role binding and that should go away.

@josephburnett josephburnett requested a review from grantr February 26, 2018 19:01
@grantr
Copy link
Contributor

grantr commented Feb 26, 2018

Detailed review coming, but in general I wonder if the queue can be replaced by envoy stats collection. Since we're likely to run envoy in every pod anyway, that eliminates a sidecar from each pod and gives us the benefit of existing and future investment in envoy's stats infrastructure.

Envoy can export metrics in statsd format. If we set it up to push to a Kubernetes Service in front of the autoscaler deployment, then the autoscaler could make scaling decisions based on any Envoy metric or in fact any statsd metric source (if we later replace Envoy or allow scaling on custom metrics).

Envoy can also export in Prometheus format for pull metrics. Given that we're currently shipping Prometheus as the default metrics provider (see #189) it seems reasonable to architect the autoscaler as a scraper, using the Kubernetes API to discover Elafros pods and collecting metrics from their Envoy instances. I like the pull architecture because it gives the autoscaler control of sampling frequency and load shedding. Like statsd, the Prometheus format is also a standard so a future Envoy replacement is likely to support it. Custom metrics could be supported by a field in the Configuration specifying a URL path.

If we decide to keep the queue and custom stats format, lets at least use Protobuf for the wire format so we have better versioning semantics. I'd also recommend gRPC over websockets.

@@ -26,6 +28,14 @@ import (
meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

const (
// Each Elafros pod gets 1 cpu.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this isn't necessarily your decision, but it seems odd to hardcode the resource requests (and limits) and disallow users specifying their own. The way it's implemented here, users can't specify memory requests or memory limits, making elafros pods inherently unsafe. The previous limit of 0.025 cpus and the current limit of 0.85 cpus both seem arbitrary.

@@ -50,11 +50,7 @@ http {
# to avoid a race condition between the two timeouts.
keepalive_timeout 650;
keepalive_requests 10000;
{{if .EnableQueue}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@grantr grantr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few remaining comment/copyright header nits, but I'm curious what @mattmoor thinks.

@@ -0,0 +1,153 @@
package autoscaler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a copyright header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I don't know how I missed that one!

@@ -0,0 +1,255 @@
package autoscaler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs copyright header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// concurrency reaches 2x the target stable concurrency. Panic mode will
// last at least 60 seconds--longer if the 2x threshold is repeatedly
// breached. During panic mode the number of pods is never decreased in
// order to prevent flapping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should go above the package declaration, which should be lowercase. See an example at https://golang.org/src/encoding/gob/doc.go.

/*
Autoscaler calculates ...
[snip]
order to prevent flapping.
*/
package autoscaler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! Stupid error. Fixed.


var (
elaPodReplicaCount = int32(2)
elaPodReplicaCount = int32(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this changed because we now have autoscaling 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. :)

The autoscale immediately recognized that we don't need 2 pods to serve 0 concurrent requests. So creating two is pointless.

@@ -1,18 +1,19 @@
package Autoscaler
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I think this one also needs a copyright header

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(When we have automated tests, we should run a linter that makes sure every go source file has a copyright header)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah! Done.

@josephburnett josephburnett merged commit bf78448 into knative:master Feb 28, 2018
markusthoemmes referenced this pull request in openshift/knative-serving Apr 3, 2019
Adding a simple autoscaler to collect metrics directly from the pods and scale based on concurrent requests.  More in //pkg/autoscaler/doc.go.

* Bring back the queue.
* Wire queue between nginx and app.
* Autoscaler and queue that share a stat type.
* Initialize queue with autoscaler service before starting stat reporter.
* Connect stat sink.
* Add gorilla websocket to deps.
* Build the queue with bazel and pass diget into controller through commandline parameter.
* Setup env variables and service account for queue to find the autoscaler.
* Create autoscaler service and deployment and connect queue.
* Reconnect to autoscaler and send pod name.
* Calculate 6 and 60 second QPS and scaling action.
* Replace 6 and 60 with parameters.
* Do actual scaling. Tune parameters.
* Scale deployment in the background.
* Request a full CPU for each ela pod.
* Calculate QPS with floats.
* Scale on concurrent requests instead of QPS.
* Provide desired concurrency per process in revision spec.
* Add test for queue-proxy. Fails becuase of extra autoscaler deployment.
* Fix unit tests by checking ela deployment separately from autoscaler deployment.
* Add autoscaler deployment env variable test.
* Move core autoscaler logic into lib for unit testing.
* Refactoring autoscaler for unit testing.
* Autoscaler unit tests.
* Autoscaler comments.
* Only accept target concurrency of 1+.
* Limit scale up ratio to 10x.
* Fix git rebase mistakes.
* Move autoscaler main to cmd.
* Move queue sidecar to cmd.
* Remove TargetConcurrencyPerProcess revision parameter.
* Replace log with glog.
* Use defaults for websocket upgrader.
* Add service account and binding for autoscaler.
* Update deps.
* Fix incorrect usage of glog.
* Const parameters.
* Fix targetConcurrency typo.
* Fix typo.
* Inject autoscaler name to remove hardcoded value.
* Pull out queue parameters into constants.
* Add liscense headers to queue and autoscaler.
* Cpu requests in constants.
* Plumb autoscaler port through env from single constant.
* Comment for autoscaler types.
* Fix log statement formatting.
* Add back ela-revision service account.
* Report time from pod with concurrency stat.
* Send only one scale request at a time with a 5 second timeout.
* Move autoscaler docs to package documentation.
* Include pod name in stat key.
* Comment about waiting for autoscaler IP.
* Add queue->autoscaler connect sleep comment.
* Parse losthost url once.
* Use singleton proxy in queue.
* Fold autoscaler/types package into autoscaler.
* Add Record and Scale function comments.
* Move environment variable access into init.
* Remove enableQueue nginx template parameter.
* Comments and copyright headers.
* Copyright header.
nak3 pushed a commit to nak3/serving that referenced this pull request Sep 6, 2019
Test operator TP1-02 via the 0.7.1 CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants