Avoid duplicate cmp instruction, avoid boolean zero extension #70003

thinkbeforecoding · 2022-05-31T08:27:59Z

I'm working on comparison optimization for the F# compiler to implement branchless compare dotnet/fsharp#13098

The following implementation improves average performance, using only a cgt - clt:

However, I noticed that the same cmp ecx edx instruction is issued twice, and this is unnecessary as setg and movzx don't change flags.

I also thought that the following version would be even shorter:

but it's actually longer. The result from cgt/clt is zero extended with movzx even if we use only the lower byte part for the subtraction, then is sign extended to 32bits.

The result code could be:

If someone is ok to guide me through this kind of JIT optimization, I'd happily implement it.

category:cq
theme:codegen
skill-level:intermediate
cost:medium
impact:medium

dotnet-issue-labeler · 2022-05-31T08:28:02Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2022-05-31T10:05:41Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

I'm working on comparison optimization for the F# compiler to implement branchless compare dotnet/fsharp#13098

The following implementation improves average performance, using only a cgt - clt:

However, I noticed that the same cmp ecx edx instruction is issued twice, and this is unnecessary as setg and movzx don't change flags.

I also thought that the following version would be even shorter:

but it's actually longer. The result from cgt/clt is zero extended with movzx even if we use only the lower byte part for the subtraction, then is sign extended to 32bits.

The result code could be:

If someone is ok to guide me through this kind of JIT optimization, I'd happily implement it.

Author:	thinkbeforecoding
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

EgorBo · 2022-05-31T15:50:12Z

I assume it's not possible to express this in pure C# and F# right? (relying on cmp's return value as int, only maybe with Unsafe.As)

EgorBo · 2022-05-31T16:34:22Z

.NET 7.0 codegen is different:

G_M60686_IG01:              ;; offset=0000H
                                                ;; size=0 bbWeight=1    PerfScore 0.00
G_M60686_IG02:              ;; offset=0000H
       33C0                 xor      eax, eax
       3BCA                 cmp      ecx, edx
       0F9FC0               setg     al
       3BCA                 cmp      ecx, edx
       0F9CC2               setl     dl
       0FB6D2               movzx    rdx, dl
       2BC2                 sub      eax, edx
                                                ;; size=17 bbWeight=1    PerfScore 3.25
G_M60686_IG03:              ;; offset=0011H
       C3                   ret
                                                ;; size=1 bbWeight=1    PerfScore 1.00

; Total bytes of code 18

tannergooding · 2022-05-31T17:45:27Z

I assume it's not possible to express this in pure C# and F# right?

Right. There is now a proposal for C# to handle cmp ? 0 : 1 as emitting just cmp however: dotnet/roslyn#61483

.NET 7.0 codegen is different:

So almost there, its just missing the chance to remove the second compare and avoid the extension until after the sub. Noting that avoiding the extension can sometimes have "negative" impact since its a "partial register read/write"

thinkbeforecoding · 2022-05-31T20:55:05Z

Yes it was submitted by @dsyme , in F# it is possible to emit inline IL so you can (as in the sample above) emit directly clt, cgt, ces IL instructions. But if roslyn can emit cgt for x > y? 1 : 0, it's basically the same.
However, x > y? (sbyte) 1 : (sbyte) 0 should remove the xor or movsx or movzx.
And get rid of the double cmp.

JulieLeeMSFT · 2022-05-31T21:03:23Z

cc @dotnet/jit-contrib.
Moving to Future as we are approaching .NET 7 code complete date.

thinkbeforecoding · 2022-06-07T13:53:23Z

Looking at the code in https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/codegenxarch.cpp,
I've seen that there already seems to have logic to avoid the movzx instruction when return type is known to be a byte:

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/codegenxarch.cpp#L1433

The logic to see if emitting a cmp based on existing flags also exists...

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/codegenxarch.cpp#L6464

But how is the genCompareInt method determining the targetType (it's using tree->TypeGet()) ? The clt/cgt IL instruction specify no return type.. the the type is maybe conditioned by the next IL instruction ?

EgorBo · 2022-06-07T18:18:23Z

cc @TIHan

TIHan · 2023-01-25T02:55:43Z

@thinkbeforecoding I made a PR to resolve this in case you are interested: #81143

thinkbeforecoding · 2023-01-25T08:25:14Z

Definitely! Thank you.
The PR will also give me hints where to look at next time I see such possible optimization.

TIHan · 2023-02-28T02:11:59Z

@thinkbeforecoding - The other PR is closed, but I made a new one: #82750

It's still work-in-progress, but the implementation of it is a lot more straightforward than my previous PR. This is because we recently added the ability to look back at more than one previous instruction, which allows us to see if the cmp is redundant or not.

ghost added the untriaged New issue has not been triaged by the area owner label May 31, 2022

am11 added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 31, 2022

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label May 31, 2022

JulieLeeMSFT added this to the Future milestone May 31, 2022

thinkbeforecoding mentioned this issue Jun 4, 2022

Perf: Implement branchless compare dotnet/fsharp#13187

Merged

TIHan self-assigned this Oct 31, 2022

TIHan modified the milestones: Future, 8.0.0 Jan 17, 2023

TIHan mentioned this issue Jan 25, 2023

[JIT] Remove duplicate 'cmp' instruction in specific comparison scenario #81143

Closed

2 tasks

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 25, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 25, 2023

TIHan mentioned this issue Feb 28, 2023

[JIT] X86/X64 - Eliminate redundant 'cmp' instructions #82750

Merged

1 task

ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 28, 2023

TIHan closed this as completed in #82750 Apr 5, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 5, 2023

ghost locked as resolved and limited conversation to collaborators May 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid duplicate cmp instruction, avoid boolean zero extension #70003

Avoid duplicate cmp instruction, avoid boolean zero extension #70003

thinkbeforecoding commented May 31, 2022 •

edited by BruceForstall

Loading

dotnet-issue-labeler bot commented May 31, 2022

ghost commented May 31, 2022

EgorBo commented May 31, 2022

EgorBo commented May 31, 2022

tannergooding commented May 31, 2022 •

edited

Loading

thinkbeforecoding commented May 31, 2022

JulieLeeMSFT commented May 31, 2022

thinkbeforecoding commented Jun 7, 2022

EgorBo commented Jun 7, 2022

TIHan commented Jan 25, 2023

thinkbeforecoding commented Jan 25, 2023

TIHan commented Feb 28, 2023

Avoid duplicate cmp instruction, avoid boolean zero extension #70003

Avoid duplicate cmp instruction, avoid boolean zero extension #70003

Comments

thinkbeforecoding commented May 31, 2022 • edited by BruceForstall Loading

dotnet-issue-labeler bot commented May 31, 2022

ghost commented May 31, 2022

EgorBo commented May 31, 2022

EgorBo commented May 31, 2022

tannergooding commented May 31, 2022 • edited Loading

thinkbeforecoding commented May 31, 2022

JulieLeeMSFT commented May 31, 2022

thinkbeforecoding commented Jun 7, 2022

EgorBo commented Jun 7, 2022

TIHan commented Jan 25, 2023

thinkbeforecoding commented Jan 25, 2023

TIHan commented Feb 28, 2023

thinkbeforecoding commented May 31, 2022 •

edited by BruceForstall

Loading

tannergooding commented May 31, 2022 •

edited

Loading