-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subsystem-bench: add regression tests for availability read and write #3311
Conversation
n_validators: 500 | ||
n_cores: 100 | ||
n_included_candidates: 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused parameter.
@@ -1,14 +1,14 @@ | |||
TestConfiguration: | |||
# Test 1 | |||
- objective: !ApprovalVoting | |||
last_considered_tranche: 89 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sorted to unify the order in all test cases.
let benchmark_name = format!("{} #{} {}", &self.path, index + 1, objective); | ||
gum::info!(target: LOG_TARGET, "{}", format!("Step {}/{}", index + 1, num_steps).bright_purple(),); | ||
gum::info!(target: LOG_TARGET, "[{}] {}", format!("objective = {:?}", objective).green(), test_config); | ||
test_config.generate_pov_sizes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to do it automatically, but not in the into_vec()
method, how it was.
@@ -841,15 +835,22 @@ fn build_overseer( | |||
pub fn prepare_test( | |||
config: TestConfiguration, | |||
options: ApprovalsOptions, | |||
with_prometheus_endpoint: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we run a few tests, they all are trying to use the same port, so I just decided to turn off the endpoint initialization because we don't need it in the CI. I don't like how it's done (an easy way). I think I should start the endpoint outside the test, like we do with pyroscope. It can be done in next PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, looks good to me.
@@ -68,7 +69,7 @@ test-linux-stable-runtime-benchmarks: | |||
# but still want to have debug assertions. | |||
RUSTFLAGS: "-Cdebug-assertions=y -Dwarnings" | |||
script: | |||
- time cargo nextest run --workspace --features runtime-benchmarks benchmark --locked --cargo-profile testnet | |||
- time cargo nextest run --filter-expr 'not deps(/polkadot-subsystem-bench/)' --workspace --features runtime-benchmarks benchmark --locked --cargo-profile testnet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this required-features = ["subsystem-benchmarks"]
enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question, and I wish I knew the answer. Somehow, it keeps running regression tests while cargo test
exclude them based on required feature.
polkadot/node/core/approval-voting/tests/approval-voting-regression-bench.rs
Outdated
Show resolved
Hide resolved
}; | ||
|
||
const BENCH_COUNT: usize = 3; | ||
const WARM_UP_COUNT: usize = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many benches in practice are needed until WARM_UP_PRECISION
is achieved ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on previous runs you never know, can be 3, can be 13.
config.max_pov_size = 5120; | ||
config.peer_bandwidth = 52428800; | ||
config.bandwidth = 52428800; | ||
config.connectivity = 75; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 75 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from availability_write.yaml
config.n_cores = 100; | ||
config.min_pov_size = 1120; | ||
config.max_pov_size = 5120; | ||
config.peer_bandwidth = 524288000000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this value correct as per validator specs (500 Mbit/s (= 62.5 MB/s)) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexggh what do you think? I just copied it from your yaml.
config.max_validators_per_core = 5; | ||
config.min_pov_size = 5120; | ||
config.max_pov_size = 5120; | ||
config.peer_bandwidth = 52428800; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different than approval voting setup. It would be better to have the correct value set in default impl of config, and include a comment when we override it.
...t/node/network/availability-distribution/tests/availability-distribution-regression-bench.rs
Show resolved
Hide resolved
polkadot/node/core/approval-voting/tests/approval-voting-regression-bench.rs
Outdated
Show resolved
Hide resolved
let usage = benchmark(test_case, options.clone()); | ||
|
||
messages.extend(usage.check_network_usage(&[ | ||
("Received from peers", 2950.0, 0.05), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have these been calibrated on reference hw ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests itself run on reference hw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall! Nice work!
One thing I expect to see solved here is using a different test pipeline for which I added a comment.
Otherwise, let's not block this one and continue with the followup PRs to achieve a properly parametrised and stable regression test.
@@ -25,6 +25,7 @@ test-linux-stable: | |||
# "upgrade_version_checks_should_work" is currently failing | |||
- | | |||
time cargo nextest run \ | |||
--filter-expr 'not deps(/polkadot-subsystem-bench/)' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move these to a separate pipeline, that allows for CI to provide the same hw specs as for validators and guarantee CPU/mem resources per POD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @alvicsam
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, my bad, we aren't running the tests here 🙈 . let's address this properly in #3530
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndreiEres could you please add a comment above the command with a brief description why you added the filter or a link to your PR.
let mut config = TestConfiguration::default(); | ||
config.latency = Some(PeerLatency { mean_latency_ms: 100, std_dev: 1.0 }); | ||
config.n_validators = 300; | ||
config.n_cores = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value should be set to the number of block validators are expected to check per relay chain block. This depends on approval voting parameters. We should also factor in async backing, that is we'd expect to have 1 included candidate at every block.
We can calibrate these in the followup PR.
* master: Finish documenting `#[pallet::xxx]` macros (#2638) Remove `as frame_system::DefaultConfig` from the required syntax in `derive_impl` (#3505) provisioner: allow multiple cores assigned to the same para (#3233) subsystem-bench: add regression tests for availability read and write (#3311) make SelfParaId a metadata constant (#3517) Fix crash of synced parachain node run with `--sync=warp` (#3523) [Backport] Node version and spec_version bumps and ordering of the prdoc files from 1.8.0 (#3508) Add `claim_assets` extrinsic to `pallet-xcm` (#3403)
…data * ao-collator-parent-head-data: add a comment (review) Finish documenting `#[pallet::xxx]` macros (#2638) Remove `as frame_system::DefaultConfig` from the required syntax in `derive_impl` (#3505) provisioner: allow multiple cores assigned to the same para (#3233) subsystem-bench: add regression tests for availability read and write (#3311) make SelfParaId a metadata constant (#3517) Fix crash of synced parachain node run with `--sync=warp` (#3523) [Backport] Node version and spec_version bumps and ordering of the prdoc files from 1.8.0 (#3508) Add `claim_assets` extrinsic to `pallet-xcm` (#3403)
…paritytech#3311) ### What's been done - `subsystem-bench` has been split into two parts: a cli benchmark runner and a library. - The cli runner is quite simple. It just allows us to run `.yaml` based test sequences. Now it should only be used to run benchmarks during development. - The library is used in the cli runner and in regression tests. Some code is changed to make the library independent of the runner. - Added first regression tests for availability read and write that replicate existing test sequences. ### How we run regression tests - Regression tests are simply rust integration tests without the harnesses. - They should only be compiled under the `subsystem-benchmarks` feature to prevent them from running with other tests. - This doesn't work when running tests with `nextest` in CI, so additional filters have been added to the `nextest` runs. - Each benchmark run takes a different time in the beginning, so we "warm up" the tests until their CPU usage differs by only 1%. - After the warm-up, we run the benchmarks a few more times and compare the average with the exception using a precision. ### What is still wrong? - I haven't managed to set up approval voting tests. The spread of their results is too large and can't be narrowed down in a reasonable amount of time in the warm-up phase. - The tests start an unconfigurable prometheus endpoint inside, which causes errors because they use the same 9999 port. I disable it with a flag, but I think it's better to extract the endpoint launching outside the test, as we already do with `valgrind` and `pyroscope`. But we still use `prometheus` inside the tests. ### Future work * paritytech#3528 * paritytech#3529 * paritytech#3530 * paritytech#3531 --------- Co-authored-by: Alexander Samusev <[email protected]>
What's been done
subsystem-bench
has been split into two parts: a cli benchmark runner and a library..yaml
based test sequences. Now it should only be used to run benchmarks during development.How we run regression tests
subsystem-benchmarks
feature to prevent them from running with other tests.nextest
in CI, so additional filters have been added to thenextest
runs.What is still wrong?
valgrind
andpyroscope
. But we still useprometheus
inside the tests.Future work