-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf] Regressions in math based array benchmarks #52316
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
This change seems to be the cause, #51901 |
@BruceForstall PTAL |
The ludcmp issue is an LSRA spill cost issue, detailed in #53703. The Array2 regression is due to unfortunate loop alignment of the innermost nested loop (of a 4 deep loop nest): in the new, slow case, this innermost loop spans 3 32-byte chunks, whereas before, it only spanned 2. The JIT aligns the loop up to 16 bytes, but that's not enough here. Oddly, I can easily repro this regression on my Linux x64 box, but not my Windows x64 box (where the test didn't show any regression in the perf lab, and in fact has been incredibly consistent, compared to the Linux one, which has moved around a bit). If I set |
So, I just generated disassembly of before (your loop weight change) vs. after, and I see things other way round. In below screenshot, left is before and right is after. Earier, we were aligning the loop with 4 bytes and making the loop span in 2 32-byte chunks, however after your change, we no longer align the loop because of If I set Finally, without loop alignment, To summarize, before your change, due to the loop alignment, we got better performance but after your change, we got the similar performance, had we didn't had loop alignment (modulo the performance impact of |
I don't understand your "To summarize" sentence. I think we might be saying the same thing. In the "before my change" case, the inner loop ended up in 2 32-byte chunks. After, it ended up in 3 32-byte chunks. In the "after" case, setting |
The |
The SciMark2 regression (both Windows and Linux) appears due to an additional spill / reload in the hot loop of
This was fixed by #53853. |
Since these have all been investigated, and follow-up issues opened, I'm closing this. |
Run Information
Regressions in ByteMark
Historical Data in Reporting System
Repro
Payloads
Baseline
Compare
Histogram
ByteMark.BenchLUDecomp
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in Benchstone.BenchI.Array2
Historical Data in Reporting System
Repro
Payloads
Baseline
Compare
Histogram
Benchstone.BenchI.Array2.Test
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in BenchmarksGame.FannkuchRedux_2
Historical Data in Reporting System
Repro
Payloads
Baseline
Compare
Histogram
BenchmarksGame.FannkuchRedux_2.RunBench(n: 10, expectedSum: 73196)
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in SciMark2.kernel
Historical Data in Reporting System
Repro
Payloads
Baseline
Compare
Histogram
SciMark2.kernel.benchSparseMult
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: