Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[salvo] Initial commit for benchmark abstraction framework #73

Merged
merged 35 commits into from
Nov 9, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
dadcf0c
[salvo] Initial commit for benchmark abstraction framework
abaptiste Oct 12, 2020
6bc1487
[salvo] Fix formatting and remove vim format metadata
abaptiste Oct 13, 2020
f5d5f8d
[salvo] Add JobControl message comment
abaptiste Oct 13, 2020
75e98e3
[salvo] Update do_ci.sh to build and test salvo
abaptiste Oct 13, 2020
d0c7a5b
[salvo] Update README.md
abaptiste Oct 13, 2020
d5558a8
[salvo] Fix formatting and remove vim format metadata
abaptiste Oct 13, 2020
8dfffa1
[salvo] Skip running tests in CI
abaptiste Oct 13, 2020
e42a89c
[salvo] Run tests and collect log output on script exit
abaptiste Oct 13, 2020
8a62d02
Revert "[salvo] Run tests and collect log output on script exit"
abaptiste Oct 13, 2020
3d5b455
[salvo] fix docstring comments
abaptiste Oct 14, 2020
2820352
[salvo] Support referencing images using a tag
abaptiste Oct 15, 2020
8ea0e61
[salvo] install dependecies and use a newer container for CI
abaptiste Oct 19, 2020
00e4202
[salvo] Trim PR
abaptiste Oct 19, 2020
5a4dce3
[salvo] Trim PR
abaptiste Oct 19, 2020
3f34d06
Kick CI
abaptiste Oct 19, 2020
822c0b1
[salvo] Specify a tag for the build container image
abaptiste Oct 19, 2020
05bea79
[salvo] fix the CI config formatting
abaptiste Oct 19, 2020
ba882b6
[salvo] Skip docker tests if the socket is missing
abaptiste Oct 19, 2020
5e6da81
Address PR Feedback
abaptiste Nov 3, 2020
1788e20
More PR Changes
abaptiste Nov 3, 2020
5129766
Cleanup and fomatting fixes
abaptiste Nov 4, 2020
35a26a0
More formatting fixes
abaptiste Nov 4, 2020
22baac5
Fix build failure
abaptiste Nov 4, 2020
81cd018
Address PR Feedback
abaptiste Nov 4, 2020
79b2cbe
Move api to top level directory
abaptiste Nov 4, 2020
4291ade
Remove patch field from source.proto
abaptiste Nov 4, 2020
c5f8f2c
Make the lib target private
abaptiste Nov 4, 2020
2392ae3
Remove glob in api/BUILD
abaptiste Nov 4, 2020
5c26812
BUILD file swizzling
abaptiste Nov 4, 2020
73d0dc0
Use ellipses for tests
abaptiste Nov 5, 2020
30d2d5d
Address PR Feedback
abaptiste Nov 6, 2020
4f072a9
Address PR Feedback
abaptiste Nov 6, 2020
84bd1af
Address PR Feedback
abaptiste Nov 6, 2020
975cf35
Address PR Feedback
abaptiste Nov 6, 2020
1f207ab
Update protos to use enums
abaptiste Nov 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
**/*.pyc
**/*.swp
**/.vscode/*
**/__pycache__/*
bazel-*
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,5 @@ Performance benchmarking can take multiple forms:
2. [siege/](siege/README.md) contains an initial attempt at a simple test to run
iteratively during development to get a view of the time/space impact of the
changes under configuration.
2. [salvo/](salvo/README.md) contains a framework that abstracts nighthawk
benchmark execution. This is still under active development
27 changes: 27 additions & 0 deletions salvo/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
licenses(["notice"])

py_binary(
name = "salvo",
srcs = [ "salvo.py" ],
srcs_version = "PY3",
deps = [
":api",
":lib",
],
)

py_library(
name = "api",
abaptiste marked this conversation as resolved.
Show resolved Hide resolved
visibility = ["//visibility:public"],
deps = [
"//src/lib/api:schema_proto",
abaptiste marked this conversation as resolved.
Show resolved Hide resolved
],
)

py_library(
name = "lib",
visibility = ["//visibility:public"],
abaptiste marked this conversation as resolved.
Show resolved Hide resolved
deps = [
"//src/lib:helper_library",
],
)
73 changes: 73 additions & 0 deletions salvo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# salvo
abaptiste marked this conversation as resolved.
Show resolved Hide resolved

This is a framework that abstracts executing multiple benchmarks of the Envoy Proxy using [NightHawk](https://github.com/envoyproxy/nighthawk).

## Example Control Documents

The control document defines the data needed to excute a benchmark. At the moment the dockerized scavenging benchmark is the only one supported. To run the benchmark, create a file with the following example contents:
abaptiste marked this conversation as resolved.
Show resolved Hide resolved

JSON Example:

```json
{
abaptiste marked this conversation as resolved.
Show resolved Hide resolved
"remote": false,
"dockerizedBenchmark": true,
"images": {
"reuseNhImages": true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we may need to update these examples after we perform some of the suggested proto field renames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Certainly doable.

"nighthawkBenchmarkImage": "envoyproxy/nighthawk-benchmark-dev:latest",
"nighthawkBinaryImage": "envoyproxy/nighthawk-dev:latest",
"envoyImage": "envoyproxy/envoy-dev:f61b096f6a2dd3a9c74b9a9369a6ea398dbe1f0f"
},
"environment": {
"v4only": true,
"envoyPath": "envoy",
"outputDir": "/home/ubuntu/nighthawk_output",
"testDir": "/home/ubuntu/nighthawk_tests"
}
}
```

YAML Example:

```yaml
remote: false
dockerizedBenchmark: true
environment:
envoyPath: 'envoy'
outputDir: '/home/ubuntu/nighthawk_output'
testDir: '/home/ubuntu/nighthawk_tests'
v4only: true
images:
reuseNhImages: true
nighthawkBenchmarkImage: 'envoyproxy/nighthawk-benchmark-dev:latest'
nighthawkBinaryImage: 'envoyproxy/nighthawk-dev:latest'
envoyImage: "envoyproxy/envoy-dev:f61b096f6a2dd3a9c74b9a9369a6ea398dbe1f0f"
```

In both examples, the envoy image being tested is a specific hash. This hash can be replaced with "latest" to test the most recently created image against the previous image built from the prior Envoys master commit.


## Building Salvo

```bash
bazel build //:salvo
abaptiste marked this conversation as resolved.
Show resolved Hide resolved
```

## Running Salvo

```bash
bazel-bin/salvo --job ~/test_data/demo_jobcontrol.yaml
```

## Testing Salvo

```bash
bazel test //test:*
abaptiste marked this conversation as resolved.
Show resolved Hide resolved
abaptiste marked this conversation as resolved.
Show resolved Hide resolved
```

## Dependencies

* python 3.6+
* git
* docker
* tuned/tunedadm (eventually)
16 changes: 16 additions & 0 deletions salvo/WORKSPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

git_repository(
name = "com_google_protobuf",
remote = "https://github.com/protocolbuffers/protobuf",
tag = "v3.10.0",
)

load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")

protobuf_deps()

local_repository(
name = "salvo_build_config",
path = ".",
)
69 changes: 69 additions & 0 deletions salvo/salvo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/usr/bin/env python3

import argparse
import logging
import os
import site
import sys

# Run in the actual bazel directory so that the sys.path
# is setup correctly
if os.path.islink(sys.argv[0]):
real_exec_dir = os.path.dirname(sys.argv[0])
os.chdir(real_exec_dir)

site.addsitedir("src")

from lib.message_helper import load_control_doc
from lib.run_benchmark import Benchmark

LOGFORMAT = "%(asctime)s: %(process)d [ %(levelname)-5s] [%(module)-5s] %(message)s"

log = logging.getLogger()

def setup_logging(loglevel=logging.DEBUG):
"""Basic logging configuration """

logging.basicConfig(format=LOGFORMAT, level=loglevel)

def setup_options():
"""Parse command line arguments required for operation"""

parser = argparse.ArgumentParser(description="Salvo Benchmark Runner")
parser.add_argument('--job', dest='jobcontrol',
help='specify the location for the job control json document')
# FIXME: Add an option to generate a default job Control JSON/YAML

return parser.parse_args()

def main():
"""Driver module for benchmark """

args = setup_options()
setup_logging()

if not args.jobcontrol:
print("No job control document specified. Use \"--help\" for usage")
return 1

job_control = load_control_doc(args.jobcontrol)

log.debug("Job definition:\n%s\n%s\n%s\n", '='*20, job_control, '='*20)

benchmark = Benchmark(job_control)
try:
benchmark.validate()
# TODO: Create a different class for these exceptions
except Exception as validation_exception:
log.error("Unable to validate data needed for benchmark run: %s", validation_exception)
return 1

benchmark.execute()

return 0


if __name__ == '__main__':
sys.exit(main())

# vim: set ts=4 sw=4 tw=0 et :
abaptiste marked this conversation as resolved.
Show resolved Hide resolved
12 changes: 12 additions & 0 deletions salvo/src/lib/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
py_library(
name = "helper_library",
data = glob([
'*.py',
'benchmark/*.py',
'common/*.py',
], allow_empty=False) +
[
"//:api",
],
visibility = ["//visibility:public"],
)
7 changes: 7 additions & 0 deletions salvo/src/lib/api/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
load("@com_google_protobuf//:protobuf.bzl", "py_proto_library")

py_proto_library(
name = "schema_proto",
srcs = glob(['*.proto'], allow_empty=False),
visibility = ["//visibility:public"],
)
30 changes: 30 additions & 0 deletions salvo/src/lib/api/control.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
syntax = "proto3";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a top level file comment to each of these proto files? A single sentence summarizing what should the file contain would be enough and would help future contributors.

We can skip this if the file only contains a single message, since the message comment achieves the same in such case.


package salvo;

import "src/lib/api/image.proto";
import "src/lib/api/source.proto";
import "src/lib/api/env.proto";


message JobControl {
abaptiste marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) Fo consistency - should we rename this file to a matching name (job_control.proto instead of just control.proto). This can help code readers to find the message.

// Specify whether the benchmark runs locally or in a service
bool remote = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can think forward and try to improve readability of the code that will be using these messages. How about we rename this to is_remote or even a bit more verbose is_remote_execution?


// Specify the benchmark to execute
oneof benchmark {
bool scavenging_benchmark = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comments explaining these types of benchmarks and the implications of setting them? As it stands I don't really understand what we mean by them.

(optional) Since we are only planning to support a smaller subset of execution modes to start with, we can also consider dropping the unsupported one from this initial commit.

bool dockerized_benchmark = 3;
bool binary_benchmark = 4;
}

// Define where we find all required sources
repeated SourceRepository source = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since fields in the code are accessed through the field name, we can make the resulting code more readable if we rename this to source_repository. Generally we have no reason for brevity in the API, more readable names make the code self-documenting.


// Define the names of all required docker images
DockerImages images = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, can we rename the field to docker_images?


// Define the environment variables needed for the test
EnvironmentVars environment = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here to environment_vars.

}

23 changes: 23 additions & 0 deletions salvo/src/lib/api/docker_volume.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
syntax = "proto3";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(fyi) This file is a good case for us not having a single BUILD target for all these proto files. None of the other proto files seem to be importing this one, so having a separate compilation unit for it makes sense from the code perspective.


package salvo;

// This message defines the properties for a given mount. It is used
// to generate the dictionary specifying volumes for the Python Docker
// SDK
message VolumeProperties {
// Defines a list of properties governing the mount in the container
string bind = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably lack of knowledge on my end, but the names "bind" and "mode" don't help me much to understand the fields. Can we include examples of what they may contain? Once I see that I may be able to suggest better names.


// Define whether the mount point is read-write or read-only
string mode = 2;
}

// This message defines the volume structure consumed by the command
// to run a docker image.
message Volume {
// Specify a map of volumes and their mount points for use in a container
map<string, VolumeProperties> volumes = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a good reason to use a map? Or could we just go with a repeated field of a message that includes both the string (I am guessing it is a name) and the volume properties?

Proto maps are a cool feature that helps to optimize code. I.e. we would use it if there are hundreds of volumes and we need to quickly look one up by name. If we are dealing with a much smaller set, we should not try to optimize prematurely since it comes with a cost. The serialization order of proto maps is undefined so this will make our life harder in any sort of unit tests and integration tests we will end up writing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name suggests that the volume contains a set of volumes? Is that the architecture?

Or is it that the volume contains a repeated set of volume properties? In which case we may want to call the field volume_properties.

}


26 changes: 26 additions & 0 deletions salvo/src/lib/api/env.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
syntax = "proto3";

package salvo;

// Capture all Environment variables required for the benchmark
message EnvironmentVars {
// Specify the IP version for tests
oneof envoy_ip_test_versions {
bool v4only = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For code readability consider longer names, since only the field name will appear in the code:

Example:
ipv4_only, ipv6_only, dual_stack

Or similar. Currently seeing the name all in code won't give much context.

bool v6only = 2;
bool all = 3;
}

// Controls whether envoy is placed between the nighthawk client and server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to have an execution mode where Envoy won't be started? Can you help me understand our plan around that?

(optional) Secondly it feels we are abusing field by giving its empty value a special meaning. It may be more explicit and readable if we have a separate explicit field, say a boolean called include_envoy.

string envoy_path = 4;

// Specify the output directory for nighthawk artifacts
string output_dir = 5;

// Specify the directory where external tests are located
string test_dir = 6;

// Additional environment variables that may be needed for operation
map<string, string> variables = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about proto maps here, unless we are optimizing, let's prefer repeated fields.

}

26 changes: 26 additions & 0 deletions salvo/src/lib/api/image.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
syntax = "proto3";

package salvo;

// Capture all docker images required for the benchmark. This object is examined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible typo: Captures ...

// first before evaluating the state of specified source locations
message DockerImages {
// Determines whether required docker are used if they already exist. The
// benchmark image and binary image must be specified. If this is set to false
// there must be a specified source location from which we build the image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we mean by "specified source location"? How does one specify it?


// This should be implicit. Images are preferred over building from source. We
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the comment as it stands. "This should be implicit." sounds like a wish and I am not sure in what context we are making the wish and why it isn't under our control.

Maybe we can improve the comment by instead focusing on describing the meaning of the proto field?

// build only if the image pull is not successful and sources are present
bool reuse_nh_images = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to spell out nighthawk in most names except here. Can we change this name to reuse_nighthawk_images instead? It is a bit longer, but longer more explicit names help onboarding and improve code readability.


// Specifies the name of the docker image containing the benchmark framework
// If populated we will attempt to pull this image
string nighthawk_benchmark_image = 2;

// Specifies the name of the docker image containing nighthawk binaries
string nighthawk_binary_image = 3;

// Specifies the envoy image from which Envoy is injected
string envoy_image = 4;
}

35 changes: 35 additions & 0 deletions salvo/src/lib/api/source.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
syntax = "proto3";

package salvo;

// Capture the location of sources needed for the benchmark
message SourceRepository {
// Specify whether this source location is Envoy or NightHawk
oneof identity {
bool envoy = 1;
bool nighthawk = 2;
}

// Specify the location of the source repository on disk. If specified
// this location is used to determine the origin url, branch, and commit
// hash. If not specified, the remaining fields must be populated
string location = 3;

// Specify the remote location of the repository. This is ignored if
// the source location is specified.
string url = 4;
abaptiste marked this conversation as resolved.
Show resolved Hide resolved

// Specify the local working branch.This is ignored if the source
// location is specified.
string branch = 5;

// Specify a commit hash if applicable. If not identified we will
// determine this from the source tree. We will also use this field
// to identify the corresponding NightHawk or Envoy image used for
// the benchmark
string hash = 6;
abaptiste marked this conversation as resolved.
Show resolved Hide resolved

// Internal use only. This field is used to specify the location of
abaptiste marked this conversation as resolved.
Show resolved Hide resolved
// the patch generated from changes in the local environment
string patch = 7;
}
Loading