Skip to content

Autoscaling Application Servers

Chris Bunch edited this page Aug 26, 2013 · 1 revision

Introduction

AppScale deploys your Google App Engine applications via the standard three tier web stack. One or more load balancers route traffic to (typically) many more application servers, which then save and retrieve data via one or more database servers. This article details how AppScale is able to automatically add and remove application servers within a machine, as well as adding new machines to run application servers.

Scaling within a machine

AppScale 1.10.0 vastly simplifies the rules concerning autoscaling for application servers. Now, we scale solely based on the number of enqueued requests at the load balancer. If it exceeds 5 requests per second (a value you can change if you like) on a given machine, then the AppController on that machine adds an application server. Alternatively, if no requests are enqueued, then the AppController on that machine tries to remove an application server. As a fail-safe, we never remove the last application server for a given application off the machine.

Scaling to new machines

Scaling to new machines can be very difficult, primarily because it's difficult to know how many application servers should be running on each machine. This depends on the app itself, what it's doing, and how it's constrained. For example, if the app is database bound, then adding application servers has less of an impact than adding database servers. Previous iterations of AppScale attempted to add application servers until a certain amount of CPU or memory was reached, then scale to new machines, but we ran into many applications that exhibited greatly varying CPU and memory usage, making it hard to come up with a general rule. So we cut this Gordian knot by simply setting a cap on the maximum number of application servers that run per machine. If we want to add another application server on our machine, but are at our cap, then we instead place a vote in to add another machine running application servers. Implementation-wise, this vote is a node in ZooKeeper. The lead AppController periodically checks these votes, and if at least two machines vote to scale up for a given app, we do so (with the special case where only one vote is needed if there's only one application server node).

Scaling down works in a similar fashion. If a machine wants to remove application servers, but we're down to the last application server on that machine, we instead vote to scale down with ZooKeeper. If we get two votes to scale down an app in a certain timeframe, then we terminate the machine.

Conclusion

So that's a whirlwind explanation of how we implement autoscaling at the application server level in AppScale. There's lots more work to do here, though. For starters, we may not want to terminate machines just when two machines vote for it, since in AWS we pay for machines on a per-hour basis. It would be wiser there to leave the machine around until the end of the hour increase we need to autoscale onto it. Also, setting a cap on the maximum number of application servers per machine could be extended to set it based on the instance type we're using, instead of being a constant. In practice, it takes about 5 minutes or so to add a new node in (double that if running with Spot Instances in AWS), but is completely automated - no user interaction required! We can always benefit from having an extra set of eyes looking over this, so join us in #appscale on freenode.net and let us know what you think!

Clone this wiki locally