Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sui-benchmark] run multiple traffic profiles on same benchmark #14054

Merged
merged 11 commits into from
Oct 10, 2023

Conversation

akichidis
Copy link
Contributor

@akichidis akichidis commented Oct 2, 2023

Description

Modifying the sui-benchmark suite in order to allow us to run benchmarks with multiple traffic profiles instead of only one constant rate as we currently do. That will allow us to simulate load of variable traffic (and type of txes etc) instead of constant one. To achieve this the Bench cli subcommand was extended to provide all the arguments as vectors allowing us to set multiple "benchmark groups" , where the properties of each benchmark group are all the values of the specified vector position. More information about this on the options.rs file.

To achieve the above we build workloads for each "benchmark group" and then we initialise all those once. Then a bench workers schedule task has been introduced to coordinate running each benchmark group. Each group is defined to run for a specified duration. That allow us to do things like run with 1K tx/s for 100 seconds, then run with 500 tx/s for 80seconds... . Similarly we can configure further the groups to run a benchmark like run with 1K tx/s for 100 seconds with 50% shared & 50% owned transactions , then run with 500 tx/s for 80 seconds with 80% own transactions and 20% adversarial transactions.

Also, the benchmark groups will run in a round robin fashion. For example, if we do configure a benchmark with 3 groups, then after the last group is run, it will go again and start from the beginning - first group. That allow us with minimal configuration (just defining a few groups) to run repetitive traffic patterns and test the system's behaviour. If a group is defined with duration unbounded then that will run for the rest of the benchmark (so someone needs to be careful with the setup). From a software perspective the BenchWorkers created for each group are re-used during the round robin so no need to do any re-setup while running or spend excessive resources . The only thing that is managed is the tasks that are spawn and complete.

Backwards compatibility: the cli parameter changes are backwards compatible so existing CI jobs won't break. Existing CI jobs will basically run with default settings having only one benchmark group.

Test Plan

An example of running a benchmark test that defines 2 benchmark groups where we want to test spiked traffic:

  • 1000 tx/sec for 360sec (100% shared objects)
  • 2000 tx/sec for 120sec (100% shared objects)

the bench sub-command line parameters look like:

--num-of-benchmark-groups 2 \
--in-flight-ratio 30,30 \
--num-workers 24,32 \
--target-qps 1000,2000 \
--shared-counter 100,100 \
--transfer-object 0,0 \
--delegation 0,0 \
--shared-counter-hotness-factor 50,50 \
--batch-payment 0,0 \
--batch-payment-size 100,100 \
--adversarial 0,0
--adversarial-cfg 0-1.0,0-1.0 \
--shared-counter-max-tip 0,0 \
--duration 360s,120s

as we can see on the above parameters we can configure all the arguments independently for each benchmark group and those will get respected on each run.

on the graph bellow we can see the input traffic to consensus is following the benchmark setup alternating between the 2 traffic profiles for the respective defined durations:

Screenshot 2023-10-03 at 00 28 59

If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process.

Type of Change (Check all that apply)

  • protocol change
  • user-visible impact
  • breaking change for a client SDKs
  • breaking change for FNs (FN binary must upgrade)
  • breaking change for validators or node operators (must upgrade binaries)
  • breaking change for on-chain data layout
  • necessitate either a data wipe or data migration

Release notes

@vercel
Copy link

vercel bot commented Oct 2, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
mysten-ui ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 10, 2023 0:06am
sui-typescript-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 10, 2023 0:06am
3 Ignored Deployments
Name Status Preview Comments Updated (UTC)
explorer ⬜️ Ignored (Inspect) Visit Preview Oct 10, 2023 0:06am
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Oct 10, 2023 0:06am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Oct 10, 2023 0:06am

Copy link
Contributor

@arun-koshy arun-koshy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be very useful!

@@ -575,6 +402,12 @@ impl Driver<(BenchmarkStats, StressStats)> for BenchDriver {
let mut num_no_gas = 0;
for (_, v) in stat_collection.iter() {
let duration = v.bench_stats.duration.as_secs() as f32;

// no reason to do any measurements when duration is zero -as this will output NaN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/-as/as

let (tx, mut rx) = tokio::sync::mpsc::channel(100);
let (stress_stat_tx, mut stress_stat_rx) = tokio::sync::mpsc::channel(100);
let (tx, mut rx) = channel(100);
let (stress_stat_tx, mut stress_stat_rx) = channel(100);
let mut bench_workers = vec![];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this a BTreeMap and group as you create the workers, then you can immediately convert it into a VecDeque instead of having to post process the vec in spawn_workers_scheduler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍

// of that benchmark group. The benchmark groups will run in a rotation fashion, unless the duration
// of the last group is set as "unbounded" which will run of the rest of the whole benchmark.
//
// Example: for Bench argument:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think defining a series of benchmarks via a configuration file would make this cleaner to setup and make it a little less error prone when running it via cli. Up to you if you want to change it now or later, but I think it be nice to define the benchmarks in a yaml and pass that in i.e.

benchmarks:
- target-qps: 1000
  shared_counter: 1
  duration: 360s
  ...
- target-qps: 2000
  transfer_object: 1
  duration: 120s
  ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do that as a follow up PR as this will require more work and potentially more work on the CI flows as well assuming we want to structure the yaml files there differently too.

@akichidis akichidis force-pushed the support-multiple-benchmark-traffic branch from befc5b6 to 27d20a2 Compare October 10, 2023 12:01
@vercel vercel bot temporarily deployed to Preview – mysten-ui October 10, 2023 12:05 Inactive
@akichidis akichidis merged commit 5855055 into main Oct 10, 2023
31 checks passed
@akichidis akichidis deleted the support-multiple-benchmark-traffic branch October 10, 2023 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants