-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid duplicate cmp instruction, avoid boolean zero extension #70003
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsI'm working on comparison optimization for the F# compiler to implement branchless compare dotnet/fsharp#13098 The following implementation improves average performance, using only a cgt - clt: However, I noticed that the same cmp ecx edx instruction is issued twice, and this is unnecessary as setg and movzx don't change flags. I also thought that the following version would be even shorter: but it's actually longer. The result from cgt/clt is zero extended with movzx even if we use only the lower byte part for the subtraction, then is sign extended to 32bits. The result code could be: If someone is ok to guide me through this kind of JIT optimization, I'd happily implement it.
|
I assume it's not possible to express this in pure C# and F# right? (relying on cmp's return value as int, only maybe with Unsafe.As) |
.NET 7.0 codegen is different: G_M60686_IG01: ;; offset=0000H
;; size=0 bbWeight=1 PerfScore 0.00
G_M60686_IG02: ;; offset=0000H
33C0 xor eax, eax
3BCA cmp ecx, edx
0F9FC0 setg al
3BCA cmp ecx, edx
0F9CC2 setl dl
0FB6D2 movzx rdx, dl
2BC2 sub eax, edx
;; size=17 bbWeight=1 PerfScore 3.25
G_M60686_IG03: ;; offset=0011H
C3 ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 18 |
Right. There is now a proposal for C# to handle
So almost there, its just missing the chance to remove the second compare and avoid the extension until after the |
Yes it was submitted by @dsyme , in F# it is possible to emit inline IL so you can (as in the sample above) emit directly clt, cgt, ces IL instructions. But if roslyn can emit cgt for x > y? 1 : 0, it's basically the same. |
cc @dotnet/jit-contrib. |
Looking at the code in https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/codegenxarch.cpp, https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/codegenxarch.cpp#L1433 The logic to see if emitting a cmp based on existing flags also exists... https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/codegenxarch.cpp#L6464 But how is the genCompareInt method determining the targetType (it's using tree->TypeGet()) ? The clt/cgt IL instruction specify no return type.. the the type is maybe conditioned by the next IL instruction ? |
cc @TIHan |
@thinkbeforecoding I made a PR to resolve this in case you are interested: #81143 |
Definitely! Thank you. |
@thinkbeforecoding - The other PR is closed, but I made a new one: #82750 It's still work-in-progress, but the implementation of it is a lot more straightforward than my previous PR. This is because we recently added the ability to look back at more than one previous instruction, which allows us to see if the |
I'm working on comparison optimization for the F# compiler to implement branchless compare dotnet/fsharp#13098
The following implementation improves average performance, using only a cgt - clt:
However, I noticed that the same cmp ecx edx instruction is issued twice, and this is unnecessary as setg and movzx don't change flags.
I also thought that the following version would be even shorter:
but it's actually longer. The result from cgt/clt is zero extended with movzx even if we use only the lower byte part for the subtraction, then is sign extended to 32bits.
The result code could be:
If someone is ok to guide me through this kind of JIT optimization, I'd happily implement it.
category:cq
theme:codegen
skill-level:intermediate
cost:medium
impact:medium
The text was updated successfully, but these errors were encountered: