This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change Description
The
db_modes_test
has been our most flaky test, with a 4.4% failure rate on the original BASH variant of this test, and a 2.1% failure rate on the Python rewrite.Working on a branch off
release/1.8.x
that still had the BASH variant of this test, I fixed the instability by doubling the duration of allsleep
statements. I experienced zero failures ofdb_modes_test
in 1,000 runs on thezach-1.8-db-modes-test-timeout
branch, giving me extremely high confidence (7 nines) that the instability is fixed.Next Steps
This is a poor solution because the test still relies on timing alone. Slow enough hardware will cause false failures as nodeos is not done initializing before the
sleep
statements expire. A better solution would be to rewrite the test correctly so that it knows when nodeos is done initializing, saving time on faster hardware and preventing test instability on slower hardware. We decided it is not worth investing time into rewritingdb_modes_test
the "correct" way at this time because Hong Kong is already writing a new testing framework which will provide these features. When that framework is available, we should reimplementdb_modes_test
the correct way on that framework.My changes here proved reliable even on the slow dual-core, 4 GB of RAM Travis CI macOS agents, so I believe we will not encounter test instability on any commonly-used developer hardware.
See Also
Pull request 7729 against
eos:develop
Old Metrics
Authenticate with AWS as shown here, sync metrics, then aggregate them:
Now query the metrics for the BASH variant...
...and the Python variant:
New Metrics
I tested these changes on both Buildkite and Travis CI.
Buildkite:
The one error ended up being unrelated to
db_modes_test
.Travis CI:
The four issues we are seeing here are two cancelled builds, one errored build (10 minutes with no log output), and one failure from a different test.
Count all builds and multiply by 5 because each build tests on five operating systems:
Math
Confidence after 1,000 runs that this solution is better than the BASH variant:
Confidence after 1,000 runs that this solution is better than the Python variant:
Consensus Changes
None.
API Changes
None.
Documentation Additions
None.