-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient recursive pattern matching (?) native code generated #37986
Comments
The question mark is regarding the fact that I don't know the official name of this pattern, I just found out it exists yesterday. Let me know if it's wrong and I'll edit it. |
if (o is string { Length: 3 } ss)
if (ss is { Length: 3 } s40) You don't need the second |
I understand I don't need the second If you're having doubts about this, consider this other chunk of code: var i = (int)o;
if(i == 2)
if(i == 2)
Console.WriteLine("arf"); which generates the following correct native code: L00ad: mov ecx, [esi+4]
L00b0: cmp ecx, 2
L00b3: jne short L00c0
L00b5: mov ecx, [0xb7d2354]
L00bb: call System.Console.WriteLine(System.String) |
If anything, the compiler won't usually do this kind of optimizations. If you get optimal code for some of code patterns, those are most likely recognized by JIT -- the compiler, on the other hand, still emits the source as-is, you can verify that in the resulting IL. |
I was about to close this issue after thinking about it for a couple of days and coming to the conclusion that checking the However, if you check the generated assembly code again, it doesn't actually check the So if the code should check the property twice to consider (again, badly written) multi-threaded code, it should copy the length in the register twice as well. And if it shouldn't (this is my vote), then it shouldn't compare the register twice to the same value (like it already does for plain integers). Either way, something is not completely correct here. Edit: Also strings in .Net are immutable, so the length should never change unless some dark magic in the VM changes them. |
Thanks for the report. Codegen for a slightly simplified version: using System;
public class C
{
static object o_s;
public static void Main()
{
object o = o_s;
if (o is string { Length: 3 } ss)
if (ss is { Length: 3 } s40)
Console.WriteLine($"arf {s40}");
}
} shows the same features -- inefficient transition from the castclass checks to the utlimate use, and the redundant length check. ; Assembly listing for method C:Main()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 loc0 [V00,T02] ( 4, 3 ) ref -> rdx class-hnd
;* V01 loc1 [V01 ] ( 0, 0 ) ref -> zero-ref class-hnd
; V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+0x00] "OutgoingArgSpace"
; V03 tmp1 [V03,T01] ( 2, 4 ) ref -> rdx "CASTCLASS eval op1"
; V04 tmp2 [V04,T00] ( 5, 6.74) ref -> rdx class-hnd "spilling QMark2"
; V05 cse0 [V05,T03] ( 3, 1.50) int -> rcx "CSE - moderate"
;
; Lcl frame size = 40
G_M48905_IG01:
4883EC28 sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M48905_IG02:
48B9882C63296B020000 mov rcx, 0x26B29632C88
488B11 mov rdx, gword ptr [rcx]
4885D2 test rdx, rdx
7411 je SHORT G_M48905_IG05
;; bbWeight=1 PerfScore 3.50
G_M48905_IG03:
48B9285F13FCFB7F0000 mov rcx, 0x7FFBFC135F28
48390A cmp qword ptr [rdx], rcx
7402 je SHORT G_M48905_IG05
;; bbWeight=0.25 PerfScore 0.81
G_M48905_IG04:
33D2 xor rdx, rdx
;; bbWeight=0.12 PerfScore 0.03
G_M48905_IG05:
4885D2 test rdx, rdx
7427 je SHORT G_M48905_IG07
;; bbWeight=1 PerfScore 1.25
G_M48905_IG06:
8B4A08 mov ecx, dword ptr [rdx+8]
83F903 cmp ecx, 3
751F jne SHORT G_M48905_IG07
83F903 cmp ecx, 3
751A jne SHORT G_M48905_IG07
48B9883163296B020000 mov rcx, 0x26B29633188
488B09 mov rcx, gword ptr [rcx]
E8551BFFFF call System.String:Concat(System.String,System.String):System.String
488BC8 mov rcx, rax
E8C5F8FFFF call System.Console:WriteLine(System.String)
;; bbWeight=0.50 PerfScore 4.50
G_M48905_IG07:
90 nop
;; bbWeight=1 PerfScore 0.25
G_M48905_IG08:
4883C428 add rsp, 40
C3 ret
;; bbWeight=1 PerfScore 1.25
; Total bytes of code 89, prolog size 4, PerfScore 20.74, (MethodHash=eba540f6) for method C:Main() We are in the process of buttoning up work for 5.0, so going to mark this as future. |
I don't know how to get the fancy output you're getting, but you can see the same problem in a much simpler scenario: using System;
public class C
{
public void M()
{
string s = System.Reflection.Assembly.GetExecutingAssembly().CodeBase;
if(s.Length == 2)
if(s.Length == 2)
Console.WriteLine("arf");
}
} Simply reading the property (the weird string assignment is so the optimizer doesn't get rid of it) is enough, it's not a pattern matching issue. |
The asm result is: L0016: mov ecx, [rax+0x8]
L0019: cmp ecx, 0x2
L001c: jnz L0035
L001e: cmp ecx, 0x2
L0021: jnz L0035 Again, the first line copies |
This comes from the jit, if you're using a checked build of the jit, and set COMPlus_JitDisasm. See viewing jit dumps for details.
Thanks. As you can see the jit is not very good at removing certain kinds of redundant branches. |
FWIW the latest 6.0 codegen does not contain redundant length compares: ; Assembly listing for method C:Main()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 loc0 [V00,T02] ( 4, 3 ) ref -> rdx class-hnd
;* V01 loc1 [V01 ] ( 0, 0 ) ref -> zero-ref class-hnd
; V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V03 tmp1 [V03,T01] ( 2, 4 ) ref -> rdx "CASTCLASS eval op1"
; V04 tmp2 [V04,T00] ( 5, 6.75) ref -> rdx class-hnd "spilling QMark2"
;
; Lcl frame size = 40
G_M48905_IG01: ;; offset=0000H
4883EC28 sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M48905_IG02: ;; offset=0004H
48B9382DF1390B020000 mov rcx, 0x20B39F12D38
488B11 mov rdx, gword ptr [rcx]
4885D2 test rdx, rdx
7411 je SHORT G_M48905_IG05
;; bbWeight=1 PerfScore 3.50
G_M48905_IG03: ;; offset=0016H
48B900BA2897FD7F0000 mov rcx, 0x7FFD9728BA00
48390A cmp qword ptr [rdx], rcx
7402 je SHORT G_M48905_IG05
;; bbWeight=0.25 PerfScore 0.81
G_M48905_IG04: ;; offset=0025H
33D2 xor rdx, rdx
;; bbWeight=0.12 PerfScore 0.03
G_M48905_IG05: ;; offset=0027H
4885D2 test rdx, rdx
7420 je SHORT G_M48905_IG07
;; bbWeight=1 PerfScore 1.25
G_M48905_IG06: ;; offset=002CH
837A0803 cmp dword ptr [rdx+8], 3
751A jne SHORT G_M48905_IG07
48B9D831F1390B020000 mov rcx, 0x20B39F131D8
488B09 mov rcx, gword ptr [rcx]
E8B4D0FEFF call System.String:Concat(System.String,System.String):System.String
488BC8 mov rcx, rax
E8E4FCFFFF call System.Console:WriteLine(System.String)
;; bbWeight=0.50 PerfScore 3.75
G_M48905_IG07: ;; offset=004CH
90 nop
;; bbWeight=1 PerfScore 0.25
G_M48905_IG08: ;; offset=004DH
4883C428 add rsp, 40
C3 ret
;; bbWeight=1 PerfScore 1.25
; Total bytes of code 82, prolog size 4, PerfScore 19.29, instruction count 21, allocated bytes for code 82 (MethodHash=eba540f6) for method C:Main() It still has one other redundant compare at IG05. Perhaps once I sort out the issues with #49713 we can get rid of this too. |
Going to mark this as future, though I am hoping to find time to work on #49713 before 6.0 is done. |
Was hoping #76283 would apply here, but it's blocked by a side effecting assignment. |
Getting better one PR at a time! |
Version Used:
VS 16.6
Steps to Reproduce:
I'm using the following code:
And the issue is that while the code generated is correct, it is indeed checking for
null
and for theLength
property twice, even in release mode, in both ILASM and native assembly (so the JIT is also failing to optimize this):Expected Behavior:
I would expect the 2nd check to be eliminated, since it's checking exactly the same conditions as the first.
category:cq
theme:basic-cq
skill-level:intermediate
cost:medium
The text was updated successfully, but these errors were encountered: