-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bench: Add the room_list
bench
#3661
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3661 +/- ##
==========================================
- Coverage 84.89% 84.86% -0.03%
==========================================
Files 272 272
Lines 29167 29167
==========================================
- Hits 24761 24753 -8
- Misses 4406 4414 +8 ☔ View full report in Codecov by Sentry. |
BenchmarkId::new("sync", number_of_rooms), | ||
&number_of_rooms, | ||
|benchmark, maximum_number_of_rooms| { | ||
benchmark.to_async(&runtime).iter(|| async { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of line in this benchmark that should be considered to be part of the setup. I don't think we want to time wiremock
starting up.
Since there are so many lines that are timed that shouldn't be, I think it's quite hard to get the benchmark to settle down. I.e. after some 10 runs I finally got it to settle down, but still it's quite the noisy benchmark:
time: [-2.6332% +7.1045% +17.983%] (p = 0.18 > 0.05)
thrpt: [-15.242% -6.6332% +2.7044%]
No change in performance detected.
Please take a look at the Bencher
documentation and more specifically at the iter_batched()
method:
If your routine requires some per-iteration setup that shouldn’t be timed, use iter_batched or iter_batched_ref.
Having that many lines as part of the benchmark also makes it quite hard to see what exactly we're measuring. So what are we intending to measure? The sync response processing, the room list creation?
Example of the output: ``` room_list/sync/10 time: [3.5673 ms 3.5899 ms 3.6181 ms] thrpt: [2.7639 Kelem/s 2.7856 Kelem/s 2.8032 Kelem/s] Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 3 (3.00%) high mild 8 (8.00%) high severe room_list/sync/100 time: [3.6342 ms 3.6667 ms 3.7037 ms] thrpt: [27.000 Kelem/s 27.273 Kelem/s 27.516 Kelem/s] Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe room_list/sync/1000 time: [30.426 ms 30.611 ms 30.810 ms] thrpt: [32.457 Kelem/s 32.668 Kelem/s 32.867 Kelem/s] Found 20 outliers among 100 measurements (20.00%) 13 (13.00%) high mild 7 (7.00%) high severe Benchmarking room_list/sync/10000: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 35.7s, or reduce sample count to 10. room_list/sync/10000 time: [353.56 ms 355.14 ms 356.97 ms] thrpt: [28.014 Kelem/s 28.158 Kelem/s 28.284 Kelem/s] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe ```
0f5eef9
to
a2b1df1
Compare
We're not particularly interested in this as of now and this is bitrotting, so closing now. If we ever decide to benchmark this properly let's reopen. |
Example of the output: