Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subsystem-bench: add regression tests for availability read and write #3311

Merged
merged 56 commits into from
Mar 1, 2024

Conversation

AndreiEres
Copy link
Contributor

@AndreiEres AndreiEres commented Feb 13, 2024

What's been done

  • subsystem-bench has been split into two parts: a cli benchmark runner and a library.
  • The cli runner is quite simple. It just allows us to run .yaml based test sequences. Now it should only be used to run benchmarks during development.
  • The library is used in the cli runner and in regression tests. Some code is changed to make the library independent of the runner.
  • Added first regression tests for availability read and write that replicate existing test sequences.

How we run regression tests

  • Regression tests are simply rust integration tests without the harnesses.
  • They should only be compiled under the subsystem-benchmarks feature to prevent them from running with other tests.
  • This doesn't work when running tests with nextest in CI, so additional filters have been added to the nextest runs.
  • Each benchmark run takes a different time in the beginning, so we "warm up" the tests until their CPU usage differs by only 1%.
  • After the warm-up, we run the benchmarks a few more times and compare the average with the exception using a precision.

What is still wrong?

  • I haven't managed to set up approval voting tests. The spread of their results is too large and can't be narrowed down in a reasonable amount of time in the warm-up phase.
  • The tests start an unconfigurable prometheus endpoint inside, which causes errors because they use the same 9999 port. I disable it with a flag, but I think it's better to extract the endpoint launching outside the test, as we already do with valgrind and pyroscope. But we still use prometheus inside the tests.

Future work

@AndreiEres AndreiEres added R0-silent Changes should not be mentioned in any release notes T12-benchmarks This PR/Issue is related to benchmarking and weights. labels Feb 13, 2024
n_validators: 500
n_cores: 100
n_included_candidates: 100
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused parameter.

@@ -1,14 +1,14 @@
TestConfiguration:
# Test 1
- objective: !ApprovalVoting
last_considered_tranche: 89
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just sorted to unify the order in all test cases.

let benchmark_name = format!("{} #{} {}", &self.path, index + 1, objective);
gum::info!(target: LOG_TARGET, "{}", format!("Step {}/{}", index + 1, num_steps).bright_purple(),);
gum::info!(target: LOG_TARGET, "[{}] {}", format!("objective = {:?}", objective).green(), test_config);
test_config.generate_pov_sizes();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to do it automatically, but not in the into_vec() method, how it was.

@@ -841,15 +835,22 @@ fn build_overseer(
pub fn prepare_test(
config: TestConfiguration,
options: ApprovalsOptions,
with_prometheus_endpoint: bool,
Copy link
Contributor Author

@AndreiEres AndreiEres Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we run a few tests, they all are trying to use the same port, so I just decided to turn off the endpoint initialization because we don't need it in the CI. I don't like how it's done (an easy way). I think I should start the endpoint outside the test, like we do with pyroscope. It can be done in next PRs.

@AndreiEres AndreiEres changed the title [WIP] subsystem-bench: split to a lib and a cli tool [WIP] subsystem-bench: add regression tests Feb 16, 2024
@AndreiEres AndreiEres changed the title [WIP] subsystem-bench: add regression tests subsystem-bench: add regression tests Feb 23, 2024
Copy link
Contributor

@alexggh alexggh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, looks good to me.

@@ -68,7 +69,7 @@ test-linux-stable-runtime-benchmarks:
# but still want to have debug assertions.
RUSTFLAGS: "-Cdebug-assertions=y -Dwarnings"
script:
- time cargo nextest run --workspace --features runtime-benchmarks benchmark --locked --cargo-profile testnet
- time cargo nextest run --filter-expr 'not deps(/polkadot-subsystem-bench/)' --workspace --features runtime-benchmarks benchmark --locked --cargo-profile testnet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this required-features = ["subsystem-benchmarks"] enough ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question, and I wish I knew the answer. Somehow, it keeps running regression tests while cargo test exclude them based on required feature.

};

const BENCH_COUNT: usize = 3;
const WARM_UP_COUNT: usize = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many benches in practice are needed until WARM_UP_PRECISION is achieved ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on previous runs you never know, can be 3, can be 13.

config.max_pov_size = 5120;
config.peer_bandwidth = 52428800;
config.bandwidth = 52428800;
config.connectivity = 75;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 75 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from availability_write.yaml

config.n_cores = 100;
config.min_pov_size = 1120;
config.max_pov_size = 5120;
config.peer_bandwidth = 524288000000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this value correct as per validator specs (500 Mbit/s (= 62.5 MB/s)) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexggh what do you think? I just copied it from your yaml.

config.max_validators_per_core = 5;
config.min_pov_size = 5120;
config.max_pov_size = 5120;
config.peer_bandwidth = 52428800;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different than approval voting setup. It would be better to have the correct value set in default impl of config, and include a comment when we override it.

let usage = benchmark(test_case, options.clone());

messages.extend(usage.check_network_usage(&[
("Received from peers", 2950.0, 0.05),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have these been calibrated on reference hw ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests itself run on reference hw.

@AndreiEres AndreiEres changed the title subsystem-bench: add regression tests subsystem-bench: add regression tests for availability read and write Feb 29, 2024
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! Nice work!

One thing I expect to see solved here is using a different test pipeline for which I added a comment.

Otherwise, let's not block this one and continue with the followup PRs to achieve a properly parametrised and stable regression test.

@@ -25,6 +25,7 @@ test-linux-stable:
# "upgrade_version_checks_should_work" is currently failing
- |
time cargo nextest run \
--filter-expr 'not deps(/polkadot-subsystem-bench/)' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move these to a separate pipeline, that allows for CI to provide the same hw specs as for validators and guarantee CPU/mem resources per POD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, my bad, we aren't running the tests here 🙈 . let's address this properly in #3530

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreiEres could you please add a comment above the command with a brief description why you added the filter or a link to your PR.

let mut config = TestConfiguration::default();
config.latency = Some(PeerLatency { mean_latency_ms: 100, std_dev: 1.0 });
config.n_validators = 300;
config.n_cores = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value should be set to the number of block validators are expected to check per relay chain block. This depends on approval voting parameters. We should also factor in async backing, that is we'd expect to have 1 included candidate at every block.

We can calibrate these in the followup PR.

@AndreiEres AndreiEres added this pull request to the merge queue Mar 1, 2024
Merged via the queue into master with commit f0e589d Mar 1, 2024
11 checks passed
@AndreiEres AndreiEres deleted the AndreiEres/subsystem-bench-lib branch March 1, 2024 15:31
ordian added a commit that referenced this pull request Mar 4, 2024
* master:
  Finish documenting `#[pallet::xxx]` macros  (#2638)
  Remove `as frame_system::DefaultConfig` from the required syntax in `derive_impl` (#3505)
  provisioner: allow multiple cores assigned to the same para (#3233)
  subsystem-bench: add regression tests for availability read and write (#3311)
  make SelfParaId a metadata constant (#3517)
  Fix crash of synced parachain node run with `--sync=warp` (#3523)
  [Backport] Node version and spec_version bumps and ordering of the prdoc files from 1.8.0 (#3508)
  Add `claim_assets` extrinsic to `pallet-xcm` (#3403)
ordian added a commit that referenced this pull request Mar 4, 2024
…data

* ao-collator-parent-head-data:
  add a comment (review)
  Finish documenting `#[pallet::xxx]` macros  (#2638)
  Remove `as frame_system::DefaultConfig` from the required syntax in `derive_impl` (#3505)
  provisioner: allow multiple cores assigned to the same para (#3233)
  subsystem-bench: add regression tests for availability read and write (#3311)
  make SelfParaId a metadata constant (#3517)
  Fix crash of synced parachain node run with `--sync=warp` (#3523)
  [Backport] Node version and spec_version bumps and ordering of the prdoc files from 1.8.0 (#3508)
  Add `claim_assets` extrinsic to `pallet-xcm` (#3403)
skunert pushed a commit to skunert/polkadot-sdk that referenced this pull request Mar 4, 2024
…paritytech#3311)

### What's been done
- `subsystem-bench` has been split into two parts: a cli benchmark
runner and a library.
- The cli runner is quite simple. It just allows us to run `.yaml` based
test sequences. Now it should only be used to run benchmarks during
development.
- The library is used in the cli runner and in regression tests. Some
code is changed to make the library independent of the runner.
- Added first regression tests for availability read and write that
replicate existing test sequences.

### How we run regression tests
- Regression tests are simply rust integration tests without the
harnesses.
- They should only be compiled under the `subsystem-benchmarks` feature
to prevent them from running with other tests.
- This doesn't work when running tests with `nextest` in CI, so
additional filters have been added to the `nextest` runs.
- Each benchmark run takes a different time in the beginning, so we
"warm up" the tests until their CPU usage differs by only 1%.
- After the warm-up, we run the benchmarks a few more times and compare
the average with the exception using a precision.

### What is still wrong?
- I haven't managed to set up approval voting tests. The spread of their
results is too large and can't be narrowed down in a reasonable amount
of time in the warm-up phase.
- The tests start an unconfigurable prometheus endpoint inside, which
causes errors because they use the same 9999 port. I disable it with a
flag, but I think it's better to extract the endpoint launching outside
the test, as we already do with `valgrind` and `pyroscope`. But we still
use `prometheus` inside the tests.

### Future work
* paritytech#3528
* paritytech#3529
* paritytech#3530
* paritytech#3531

---------

Co-authored-by: Alexander Samusev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes T12-benchmarks This PR/Issue is related to benchmarking and weights.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants