Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance triage by bisecting the range #2916

Open
Tracked by #4095
kotlarmilos opened this issue Feb 24, 2023 · 6 comments
Open
Tracked by #4095

Performance triage by bisecting the range #2916

kotlarmilos opened this issue Feb 24, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@kotlarmilos
Copy link
Member

The idea is to improve performance triage for complex cases (i.e. dotnet/perf-autofiling-issues#11539 and dotnet/perf-autofiling-issues#11536) by bisecting the range and identifying a commit that caused the regression. Such tool could input commit range and microbenchmark grouping, and could output a commit that caused the regression with logs of a searched path.

The tool could be made available for local environment where it is limited to a particular os and architecture, which I consider as a low effort. Also, it could be added as an extension to the dashboard where for a particular set of microbenchmarks it can be executed, which I consider as a mid to high effort with broader scope and easy to use approach, but with security concerns.

I would like to obtain feedback before making it available for local environments. If such tool is considered as useful in the dashboard, I am ready to help.

@kotlarmilos kotlarmilos added the enhancement New feature or request label Feb 24, 2023
@kotlarmilos kotlarmilos self-assigned this Feb 24, 2023
@sblom
Copy link
Contributor

sblom commented Feb 27, 2023

Would love to see something like this! Are you imagining enhancing benchmarks_ci.py to do this?

Would be very interesting to try to factor this as a git bisect predicate so that git can do the bisection work, but we can help it narrow in, although I worry that without perfect test stability we might confuse git bisect.

@kotlarmilos
Copy link
Member Author

It is a good idea to enhance benchmarks_ci.py. I agree that it would be interesting to factor it as a git bisect. For regressions with an order of magnitude difference even without perfect test stability we might get correct results.

I can propose changes to benchmarks_ci.py in a draft PR.

@matouskozak
Copy link
Member

I would like to revive the discussion about the triage script.

I think that with recent advancements by @LoopedBard3 on https://github.com/dotnet/performance/blob/main/scripts/benchmarks_local.py. We could utilize this script to binary search thru commits to identify the source of regression (or improvement).

My idea for the script would be:
Input: baseline hash, compare hash, filter for microbenchmarks
Output: hash of commit causing regression

Algorithm steps:

  1. Measure the magnitude of regression between baseline and compare hash. We could use the ResultsComparer tool https://github.com/dotnet/performance/tree/main/src/tools/ResultsComparer for measuring. The tool can save results to csv which could be processed by the script later.
  2. Using binary search algorithm, try to find commit causing the same magnitude of regression as computed in 1.
    • It is possible that the results between commits are not monotonous which could break the binary search algorithm. Small variance could be detected and ignored, if there would be bigger spikes than the script should either abort or "split" itself to work on subintervals.

Further ideas:

  • Reverting the found commit to verify the cause of regression.
  • Checking out new repository and building from zero for each tested commit might be too time consuming. Having 3 repos (left, mid, right) and just using git checkout <commit> without cleaning artifacts could make the search faster, however it could cause some unwanted behavior. I would make this an optional feature for user to decide.

What do you think? @sblom @kotlarmilos @LoopedBard3

@kotlarmilos
Copy link
Member Author

Good idea. There is already --diff command that allows for easy comparison.

Using binary search algorithm, try to find commit causing the same magnitude of regression as computed in 1.

I would like to explore the possibility of implementing it using git bisect (https://git-scm.com/docs/git-bisect). Additionally, I think we can extend the script for local testing to support this feature.

@matouskozak @LoopedBard3 I suggest we schedule a call to discuss it further.

@matouskozak
Copy link
Member

I selected a few benchmarks from Mono perf issues that regressed in the past and have stable results to be suitable for testing the bisecting algorithm.


The first set comes from dotnet/perf-autofiling-issues#21818 (linux-x64, AOT):

System.Memory.ReadOnlySpan

baseline: 9d08b24d743d0c57203a55c3d0c6dd1bc472e57e
compare: f6c592995a4f8526508da33761230a3850d942ff
expected result: 4a09c82215399c27f52277a8db7178270410c693

System.Globalization.Tests.StringSearch

baseline: da4e544809b3b10b7db8dc170331988d817240d7
compare: f6c592995a4f8526508da33761230a3850d942ff
expected result: 4a09c82215399c27f52277a8db7178270410c693


The second set comes from dotnet/perf-autofiling-issues#15795 (linux-arm64, AOT):

System.Tests.Perf_DateTimeOffset, System.Tests.Perf_DateTime, System.Globalization.Tests.Perf_DateTimeCultureInfo

baseline: 0fc78e62bb0b8d824efe7421983702b97a60def6
compare: 86b48d7c6f081c12dcc9c048fb53de1b78c9966f
expected result: ef1ba771347dc1d7d626907e4731b8f2e3cf78b3


The third set comes from dotnet/perf-autofiling-issues#14570 (linux-arm64, AOT):

System.Runtime.Intrinsics.Tests.Perf_Vector128Of, System.Runtime.Intrinsics.Tests.Perf_Vector128Of, System.Runtime.Intrinsics.Tests.Perf_Vector128Float

baseline: dce343987e81ce6cf045d983ce62d8e117d2c142
compare: c6cc201665cb3c4d61851392bd8f8d8093688500
expected result: 9bf28fe5a0bc2344bac261e07b27e6c55033318f


The last set comes from dotnet/perf-autofiling-issues#22688 (linux-x64, JIT):

This set represents improvements (in contrast with previous three that represented regression)

System.Text.Json.Serialization.Tests.WriteJson

baseline: 736dabeca728ccf8b911d96d1b4c575b4d0db7d2
compare: fc5c29692fc1a92426b7d1ce8c501e7696062bb6
expected result: 169e22c8f9f00719d87f0674954fee688b556b4a

@kotlarmilos
Copy link
Member Author

Thanks @matouskozak! Let's create mock data based on your input and prototype the bisecting process using a moving average filter-like approach. @LoopedBard3 since you worked on the script for local benchmarking, feel free to reflect on missing features (comparer, bisect, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants