Skip to content

Latest commit

 

History

History
118 lines (75 loc) · 6.83 KB

README.md

File metadata and controls

118 lines (75 loc) · 6.83 KB

go-hammer

Build Docker Image CI

A load generator written in Go.

Usage

Docker

Using the docker image:

docker pull fdingler/go-hammer
docker run -it go-hammer --endpoint https://google.com --tps 10

Command Line Tool

Using the CLI directly.

cli --endpoint https://www.google.com --duration 60 --tps 1

To install the CLI, run go install within the cli folder.

What is a Hammer?

The name go-hammer comes from the analogy of comparing load testing a service with the activity of hitting a nail hard with a hammer many times. The nail would be your endpoint and the hammer is the tool used to hit it.

This concept provides extensibility to the library by decoupling the load generation from the logic that creates each request (the hammer). In other words, you can swap the hammer that you use to hit your endpoint with any hammer that conforms to the Hammer interface.

type Hammer interface {
	Hit() HammerResponse
}

type HammerResponse struct {
	Latency   int // milliseconds
	Status    int // status code
	Timestamp time.Time
	Failed    bool
	Body      []byte
}

The hammers package contains built-in hammers that you can leverage to get started quickly, like the HTTPHammer. But you can write your own custom hammers that have any logic you want to run on every request.

Internals

I started this project with the goal of learning Go during the COVID-19 quarantine. I decided to write a load generator because Load Testing is a space that I find really interesting and spend a lot of time thinking about. We also recently built a Distributed Load Testing Engine on AWS, so this topic was fresh in my mind.

I will use this readme to document implementation details, design decisions, lessons learned and ideas I want to work on to improve the project.

TPS Generator

The loadgen goroutine creates as many goroutines as TPS specified every second. These goroutines are the hammers that generate a request targeting the endpoint under test. They capture the response of the request, measure the roundtrip latency and communicate the results back to a shared responses channel–– they die right after that. There is an aggregator goroutine reading the responses from the shared channel, writing them to standard output and keeping them in-memory to compute a final summary with latency statistics. The main goroutine serves as an orchestrator and tells the aggregator to stop and summarize results when it receives a signal of completion from loadgen.

Goroutines

Decoupling the hammers implementation from the TPS generation and orchestration gives a lot of opportunity for extensibility. There can be hammers of different types, for example, a hammer of type HTTPHammer knows how to trigger HTTP requests, but a hammer of type IoTHammer knows how to generate MQTT requests. Moreover, the Hammer interface allows any user to write their own hammer implementation as long as it complies with the interface.

Will it be a problem to have so many goroutines running at once?
Goroutines are very lightweight and efficient as they do not map directly to OS threads. In the tests that I have done, only a handful of threads are used for ~100 goroutines in-flight, so having thousands or goroutines should not be a problem at all.

Can the aggregator become a bottleneck?
Possibly above 10K TPS. When the hammers write to the responses channel they block until the aggregator reads; This is the nature of how channels work because they are also a synchronization mechanism. In my benchmark, the aggregator takes on average 0.05ms to process a response from the channel, which means that it can flush 10,000 messages in 500ms. So, as long as the aggregator can drain the entire channel in less than 1 second, then it won't become a bottleneck.

Alternatives considered

Long living goroutines
Instead of creating short-lived single goroutines (hammers) per request, I considered creating as many goroutines as TPS specified and have each goroutine be long-living for the duration of the load test (i.e. 30 minutes). Each of these workers would be in charge of generating a request every second. The challenge is that if a given request takes longer than 1 second to complete, the system will not meet the desired TPS. This is not a good approach.

Run tests

Functional testing is done by spinning up a local web server and then pointing the load generator against it. The local web server verifies that the expected number of requests generated are actually received.

go test -v

Profiling

go test -cpuprofile cpu.prof -memprofile mem.prof -bench .
go tool pprof cpu.prof

Limits

The max TPS is usually constrained by the host OS max open files limit. The package net/http opens a socket for every HTTP request in-flight. To find this limit in Unix use the following command:

ulimit -n

Increase this limit by passing the desired number:

ulimit -n 1000

Ideas to work on

Distribute requests evenly within the timeframe of a second
The current implementation triggers all requests in the beginning of a second, obviously each consecutive request is triggered slightly (a few microseconds) after the previous one, but this means that all the remaining time before the second is over is empty and quiet with no requests. The idea would be to distribute all of these requests evenly within the timeframe of a second to mimic real-world traffic patterns more realistically and therefore avoid spikes of pressure on the system under test.

Replay production load
Given a set of access logs captured from real production traffic, add a capability to this library to create a test scenario that replays and reproduces the exact same traffic taken from the logs. There needs to be a process that reads the logs first and builds a timeline to be executed during the load generation. A challenge here is that access logs do not contain body payloads, only headers.

The following diagram illustrates the architecture for replaying a timeline of logs. Each goroutine involved is represented as a circle. The timeline goroutine holds an in-memory representation of the timeline of how requests happened, it should expose an API to get the next batch of requests for the following second, it could return zero if there are no requests to be made in that given second.

ReplayLogs

Building the timeline itself from a set of logs is not represented in the diagram. That's probably the job of another tool that given a set of logs in a particular format, it builds a timeline that go-hammer can understand.