Loop alignment issue in Array2 benchmark #54072
Labels
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
untriaged
New issue has not been triaged by the area owner
A recent change #51901 leading to a regression in the Benchstone.BenchI.Array2 benchmark on Ubuntu (but not Windows): #52316.
The core of the benchmark is the
Bench
function inner loop:The code of this loop is almost equivalent, modulo register allocation, before and after #51901. The difference is loop alignment: before #51901, the loop fits in 2 32-byte chunks; after, it is in 3 32-byte chunks. On Ubuntu, this leads to about a 50% performance regression. Simply setting
COMPlus_JitAlignLoopAdaptive=0
changes the alignment such that the inner loop fits in 2 32-byte chunks, recovering the performance.This is a high weight basic block; perhaps the alignment heuristics should "try harder" and be willing to insert more alignment padding in case it might be profitable?
The text was updated successfully, but these errors were encountered: