Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge from upstream #6

Merged
merged 20 commits into from
Jun 28, 2020
Merged

merge from upstream #6

merged 20 commits into from
Jun 28, 2020

Conversation

eric846
Copy link
Owner

@eric846 eric846 commented Jun 28, 2020

No description provided.

eric846 and others added 20 commits June 3, 2020 12:49
When buildifier or buildozer can't be found, these instructions are printed:

```
ERROR: Command /usr/local/bin/buildifier not found. If you have buildifier installed, but the binary name is different or it's not available in $GOPATH/bin, please use BUILDIFIER_BIN environment variable to specify the path. Example:
    export BUILDIFIER_BIN=`which buildifier`
If you don't have buildifier installed, you can install it by:
    go get -u github.com/bazelbuild/buildtools/buildifier
ERROR: Command /usr/local/bin/buildozer not found. If you have buildozer installed, but the binary name is different or it's not available in $GOPATH/bin, please use BUILDOZER_BIN environment variable to specify the path. Example:
    export BUILDOZER_BIN=`which buildozer`
If you don't have buildozer installed, you can install it by:
    go get -u github.com/bazelbuild/buildtools/buildozer
```

After doing the `go get` and  `export`, we still get the same message.

It turns out do_ci.sh unconditionally overwrites BUILDIFIER_BIN and BUILDOZER_BIN.

We have been using this workaround:

```
ln -s $HOME/go/bin/buildifier /usr/local/bin/buildifier 
ln -s $HOME/go/bin/buildozer /usr/local/bin/buildozer
```
…338)

If `echo_request_headers` option is set to `true`, then request headers dump is appended to the response body.
Avoid excepting in scenarios with many workers:
- Log an error instead of throwing
- Tweaks kMinimalWorkerDelay so that the problem may in practice
  resolve: allow more headroom as configured concurrency increases.

Intends to fix #339

Signed-off-by: Otto van der Schaaf <[email protected]>
- Cap the number of decimals when rendering the percentiles column data.
- Add a more detailed test for fixating output of the human readable output formatter, so we'll catch unintended changes better next time.

Fixes #352

Signed-off-by: Otto van der Schaaf <[email protected]>
This PR has changes to deflake integration tests for sanitizer runs.

- Allow servers more time to start up
- Slow down some tests to make them less sensitive to timing issues 
- It is possible for some counter values to slightly exceed a configured termination threshold
  for them in edge cases. Update expectations to match that. 

Fixes #332
In preparation of sharing functionality for signal handling,
extract what we have right now into a `SignalHandler` class.

Preparation for #280

Signed-off-by: Otto van der Schaaf <[email protected]>
Adds `--no-duration`, which will be treated as a request to infinitely
continue load generation.  Other default predicates will still exist, so\
execution may still stop when connection failures or error response codes
are observed.

This leaves some small tech debt:

- We need test coverage, but that's much easier to add once
  #280 or #344 goes in

Fixes #333

Signed-off-by: Otto van der Schaaf <[email protected]>
This makes the gRPC service log more then one ExecutionStream requests.
The log level used will be controlled by the options associated to
ExecutionStream requests (possible candidate for improvement later on).

This change also incorporates a change to avoid creating multiple instances
of Envoy::ProcessWide in the Nighthawk gRPC service flow.
This doesn't seem to address any know issues, but it seems to be more along
the way Envoy::ProcessWide was intended to be used.

Fixes #289

Signed-off-by: Otto van der Schaaf <[email protected]>
Following some digging into stability of our coverage testing, creating this PR for testing CI
with some fixes / enhancements, as well as covering a few more lines in our
CLI tooling exception paths. There seems to be something racy going on where sometimes the coverage originating from our python integration tests is not counted.

Eventually I learned that things might stabilise when we switch the toolchain to clang 10. We follow Envoy's lead on that, so for now stopping further efforts on this.

### Description of the changes in this PR:

- Increase timeouts of ASAN/TSAN runs
- Make `envoy_build_sha.sh`, which apparently broke in one of the Envoy updates
- Cover some of the outer exception handling paths for Nighthawk's CLIs
- Integration test for nighthawk_output_transform
- Add coverage for code lines where it seemed trivial to do so (e.g. NullStatistic)
- Fix `run_nighthawk_bazel_coverage.sh` to work on my machine
- Stricter bash options in some scripts + changes to make that work

### Prerequisites

- [x] Needs #357 to go in first.

---------

Leaving some notes/learnings here:

- there seems to be something racy going on where sometimes the coverage originating from
   our python integration tests is not counted. This is relatively infrequent in CI.
   - On my dev machine this structurally happens. This might be a toolchain/version issue.
   - On my dev machine, using the docker flow, I have difficulties: the docker container exits when running the full test suite, without leaving a trace why. bash traps in the container don't fire, no useful logging from docker. Eventually I was able to get past bazel fetching its deps, and then build just the python integration tests. As long as a single test is involved, things seem to work / the container gets to finish. It's remarkable that CI doesn't have this problem. Coverage structurally looks as expected in this scenario.
- The link step is super memory hungry. Separating that step might be a ci perf opportunity. Coverage is the slowest CI job we have.
Fix benchmark_http_client_test.cc which has a bug in function testBasicFunctionality().

For example, if we have a test fixture with testBasicFunctionality(max_pending=0,connection_limit=15,amount_of_request=10),
then EXPECT_FALSE(client_->tryStartRequest(f)) and
EXPECT_EQ(max_pending == 0 ? 1 : max_pending, inflight_response_count) will fail.

For test fixture with
testBasicFunctionality(max_pending=2,connection_limit=3,amount_of_request=10),
EXPECT_EQ(max_pending == 0 ? 1 : max_pending, inflight_response_count) will fail.

Added 2 test cases to verify the expected behavior fixed by this change.

Tested:
Unit test.
Notable changes in this Update:

- Changes related to header construction
- CI moves from clang-9 to clang-10
- Changes related to satisfy the new clang-tidy version:
  - Fix superfluous initializers in the rate limiter code.
  - Suppress modernize-avoid-bind to get sequencer_test.cc to pass checks.
  (clang-tidy's auto generated suggested fix fails to build)
- Move to use typed configs in `AddFilter()` in the server filter tests. 
  This avoids asserts being triggered by the integration test framework
  as deprecated fields for which support was dropped get generated
  otherwise.
- Drop the coverage threshold by 0.1%

Hopefully addresses #362 by producing more reliable and accurate
coverage reports.

Signed-off-by: Otto van der Schaaf <[email protected]>
Teach the CLI to handle SIGTERM/SIGINT, and handle those as a request to
gracefully cancel execution.

Partially resolves #280
…384)

In #366 we had to lower the coverage threshold as that PR didn't come
with the means to easily cover testing with infinite excecution
durations. Now that we have signal handling in place, we can easily
pull this off.

Fixes #370

Signed-off-by: Otto van der Schaaf <[email protected]>
This PR has changes that make it easy to consume our integration testing as a benchmarking
tool framework for external repositories, allowing those to write custom tests for running in CI.

This probably needs to be split up into multiple PR's, but this allows reviewers to
have an early peek at what such PR's would lead up to.

The `README.md` in here has more context, but the primary goals are te be able to:
- run the tests against arbitrary Envoy revisions
- persist profile dumps, flamegraphs, and latency numbers per test
- run the nighthawk tools via docker
- offer stock tests, but also allow scaffolding consumer-specific tests

Example of a fully dockerized flow (more example in `README.md`):

```
# This script runs the dockerized benchmarking framework, which in
# turn will pull Nighthawk and Envoy in via docker. 

set -eo pipefail
set +x
set -u

# The benchmark logs and artifacts will be dropped here
OUTDIR="/home/oschaaf/code/envoy-perf-vscode/nighthawk/benchmarks/tmp/"
# Used to map the test that we want to see executed into the docker container
TEST_DIR="/home/oschaaf/code/envoy-perf-vscode/nighthawk/benchmarks/test/"

# Rebuild the docker in case something changed.
./docker_build.sh && 
docker run -it --rm \
  -v "/var/run/docker.sock:/var/run/docker.sock:rw" \
  -v "${OUTDIR}:${OUTDIR}:rw" \
  -v "${TEST_DIR}:/usr/local/bin/benchmarks/benchmarks.runfiles/nighthawk/benchmarks/external_tests/" \
  --network=host \
  --env NH_DOCKER_IMAGE="envoyproxy/nighthawk-dev:latest" \
  --env ENVOY_DOCKER_IMAGE_TO_TEST="envoyproxy/envoy-dev:f61b096f6a2dd3a9c74b9a9369a6ea398dbe1f0f" \
  --env TMPDIR="${OUTDIR}" \
  oschaaf/benchmark-dev:latest ./benchmarks --log-cli-level=info -vvvv 
```

Part of #245
@eric846 eric846 merged commit 5ac755a into eric846:master Jun 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants