Loop alignment issue in Array2 benchmark #54072

BruceForstall · 2021-06-11T18:13:54Z

A recent change #51901 leading to a regression in the Benchstone.BenchI.Array2 benchmark on Ubuntu (but not Windows): #52316.

The core of the benchmark is the Bench function inner loop:

for (; loop != 0; loop--) {
    for (int i = 0; i < 10; i++) {
        for (int j = 0; j < 10; j++) {
            for (int k = 0; k < 10; k++) {
                d[i][j][k] = s[i][j][k];
            }
        }
    }
}

The code of this loop is almost equivalent, modulo register allocation, before and after #51901. The difference is loop alignment: before #51901, the loop fits in 2 32-byte chunks; after, it is in 3 32-byte chunks. On Ubuntu, this leads to about a 50% performance regression. Simply setting COMPlus_JitAlignLoopAdaptive=0 changes the alignment such that the inner loop fits in 2 32-byte chunks, recovering the performance.

This is a high weight basic block; perhaps the alignment heuristics should "try harder" and be willing to insert more alignment padding in case it might be profitable?

The text was updated successfully, but these errors were encountered:

kunalspathak · 2021-06-11T18:48:09Z

I have shared my investigation in #52316 (comment). In before case, we align the loop (with 6 bytes), but after your change we don't align it because of other heuristics (loop size vs. no. of padding needed).

This is a high weight basic block; perhaps the alignment heuristics should "try harder"

We already have COMPlus_JitAlignLoopMinBlockWeight that tells if the alignment should be done, and in this case, we do try to align, but we want to restrict ourselves in adding too much padding because of code size reason, hence we have conservative heuristics. I don't think we can do much for this case; to put in other way - Before your change, the benchmark was benefitting from loop alignment, but after your change, it is not.

kunalspathak · 2021-06-15T06:09:35Z

No action needed here.

BruceForstall added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 11, 2021

BruceForstall assigned kunalspathak Jun 11, 2021

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jun 11, 2021

kunalspathak closed this as completed Jun 15, 2021

ghost locked as resolved and limited conversation to collaborators Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loop alignment issue in Array2 benchmark #54072

Loop alignment issue in Array2 benchmark #54072

BruceForstall commented Jun 11, 2021

kunalspathak commented Jun 11, 2021

kunalspathak commented Jun 15, 2021

Loop alignment issue in Array2 benchmark #54072

Loop alignment issue in Array2 benchmark #54072

Comments

BruceForstall commented Jun 11, 2021

kunalspathak commented Jun 11, 2021

kunalspathak commented Jun 15, 2021