Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Substitute constant into loop testing variable where possible #90622

Closed
wants to merge 14 commits into from

Conversation

hez2010
Copy link
Contributor

@hez2010 hez2010 commented Aug 15, 2023

Substitute constant into loop testing variable where possible.

Example:

[MethodImpl(MethodImplOptions.NoInlining)]
int Test1()
{
    var sum = 0;
    var iter = new Config { Iteration = 4 }.Iteration;
    for (var i = 0; i < iter; i++)
    {
        sum++;
    }
    return sum;
}

[MethodImpl(MethodImplOptions.NoInlining)]
int Test2()
{
    var sum = 0;
    var iter = new Config { Iteration = Vector256<int>.Count }.Iteration;
    for (var i = 0; i < iter; i++)
    {
        sum++;
    }
    return sum;
}

struct Config
{
    public int Iteration { get; set; }
}

Codegen for Test1 and Test2:

; Assembly listing for method Program:Test1():int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 loc0         [V00,T00] (  0,  0   )     int  ->  zero-ref
;* V01 loc1         [V01    ] (  0,  0   )     int  ->  zero-ref    single-def
;* V02 loc2         [V02    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op <Config>
;* V03 loc3         [V03    ] (  0,  0   )     int  ->  zero-ref
;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V05 tmp1         [V05    ] (  0,  0   )     int  ->  zero-ref    "field V02.<Iteration>k__BackingField (fldOffset=0x0)" P-INDEP
;
; Lcl frame size = 0

G_M58000_IG01:  ;; offset=0x0000
                                                ;; size=0 bbWeight=4 PerfScore 0.00
G_M58000_IG02:  ;; offset=0x0000
       mov      eax, 4
                                                ;; size=5 bbWeight=4 PerfScore 1.00
G_M58000_IG03:  ;; offset=0x0005
       ret
                                                ;; size=1 bbWeight=4 PerfScore 4.00

; Total bytes of code 6, prolog size 0, PerfScore 5.60, instruction count 2, allocated bytes for code 6 (MethodHash=ad821d6f) for method Program:Test1():int (FullOpts)
; ============================================================

; Assembly listing for method Program:Test2():int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 loc0         [V00,T00] (  0,  0   )     int  ->  zero-ref
;* V01 loc1         [V01    ] (  0,  0   )     int  ->  zero-ref    single-def
;* V02 loc2         [V02    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op <Config>
;* V03 loc3         [V03    ] (  0,  0   )     int  ->  zero-ref
;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V05 tmp1         [V05    ] (  0,  0   )     int  ->  zero-ref    "field V02.<Iteration>k__BackingField (fldOffset=0x0)" P-INDEP
;
; Lcl frame size = 0

G_M63890_IG01:  ;; offset=0x0000
                                                ;; size=0 bbWeight=4 PerfScore 0.00
G_M63890_IG02:  ;; offset=0x0000
       mov      eax, 8
                                                ;; size=5 bbWeight=4 PerfScore 1.00
G_M63890_IG03:  ;; offset=0x0005
       ret
                                                ;; size=1 bbWeight=4 PerfScore 4.00

; Total bytes of code 6, prolog size 0, PerfScore 5.60, instruction count 2, allocated bytes for code 6 (MethodHash=03ae066d) for method Program:Test2():int (FullOpts)
; ============================================================

Another example involving array length, casting and nested loop:

[MethodImpl(MethodImplOptions.NoInlining)]
int Test()
{
    var sum = 0;
    var iter = new Config { Value = new int[new int[4].Length] }.Iter;
    for (var i = 0; i < iter; i++)
    {
        var y = (int)new Config { X = 4 }.Y;
        for (var j = 0; j < y; j++)
        {
            sum++;
        }
    }
    return sum;
}

struct Config
{
    public int[] Value { get; set; }
    public int Iter => Value.Length;
    public int X { get; set; }
    public long Y => X;
}

Codegen for Test:

; Assembly listing for method Program:Test():int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 6 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 loc0         [V00,T00] (  0,  0   )     int  ->  zero-ref
;* V01 loc1         [V01    ] (  0,  0   )     int  ->  zero-ref    single-def
;* V02 loc2         [V02    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op <Config>
;* V03 loc3         [V03    ] (  0,  0   )     int  ->  zero-ref
;* V04 loc4         [V04    ] (  0,  0   )     int  ->  zero-ref
;* V05 loc5         [V05    ] (  0,  0   )     int  ->  zero-ref
;  V06 OutArgs      [V06    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V07 tmp1         [V07    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Inlining Arg" <int[]>
;* V08 tmp2         [V08    ] (  0,  0   )     ref  ->  zero-ref    "field V02.<Value>k__BackingField (fldOffset=0x0)" P-INDEP
;* V09 tmp3         [V09    ] (  0,  0   )     int  ->  zero-ref    "field V02.<X>k__BackingField (fldOffset=0x8)" P-INDEP
;* V10 tmp4         [V10,T02] (  0,  0   )    long  ->  zero-ref    "argument with side effect"
;* V11 cse0         [V11,T01] (  0,  0   )    long  ->  zero-ref    "CSE - aggressive"
;
; Lcl frame size = 40

G_M19777_IG01:  ;; offset=0x0000
       sub      rsp, 40
                                                ;; size=4 bbWeight=16 PerfScore 4.00
G_M19777_IG02:  ;; offset=0x0004
       mov      eax, 16
                                                ;; size=5 bbWeight=16 PerfScore 4.00
G_M19777_IG03:  ;; offset=0x0009
       add      rsp, 40
       ret
                                                ;; size=5 bbWeight=16 PerfScore 20.00

; Total bytes of code 14, prolog size 4, PerfScore 29.40, instruction count 4, allocated bytes for code 14 (MethodHash=06bbb2be) for method Program:Test():int (FullOpts)
; ============================================================

@hez2010 hez2010 changed the title [Draft] Substitute constant into loop testing variable where possible JIT: Substitute constant into loop testing variable where possible Aug 15, 2023
@hez2010 hez2010 closed this Aug 15, 2023
@hez2010 hez2010 reopened this Aug 15, 2023
@hez2010 hez2010 changed the title JIT: Substitute constant into loop testing variable where possible (Draft) JIT: Substitute constant into loop testing variable where possible Aug 15, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 15, 2023
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 15, 2023
@ghost
Copy link

ghost commented Aug 15, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Another try on #90591.

Waiting for CI to check the impact.

Example:

[MethodImpl(MethodImplOptions.NoInlining)]
int Test()
{
    var sum = 0;
    var config = new Config { Iteration = 4 };
    for (var i = 0; i < config.Iteration; i++)
    {
        sum++;
    }
    return sum;
}

struct Config
{
    public int Iteration;
}

Diff for Test:

  G_M19777_IG01:  ;; offset=0x0000

  G_M19777_IG02:  ;; offset=0x0000
-        xor      eax, eax
-        xor      ecx, ecx
-        align    [0 bytes for IG03]

- G_M19777_IG03:  ;; offset=0x0004
+ G_M19777_IG03:  ;; offset=0x0005
-        inc      eax
-        inc      ecx
-        cmp      ecx, 4
-        jl       SHORT G_M19777_IG03
+        mov      eax, 4
+        ret

- G_M19777_IG04:  ;; offset=0x000D
-        ret
Author: hez2010
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@hez2010

This comment was marked as outdated.

@hez2010 hez2010 changed the title (Draft) JIT: Substitute constant into loop testing variable where possible JIT: Substitute constant into loop testing variable where possible Aug 16, 2023
@hez2010 hez2010 marked this pull request as ready for review August 16, 2023 08:24
@hez2010
Copy link
Contributor Author

hez2010 commented Aug 16, 2023

Testing failure seems unrelated.

@jakobbotsch
Copy link
Member

My personal opinion: this does not have enough hits to justify. I would also expect that @AndyAyersMS's planned change to assertion prop during morph will address these cases more naturally by enabling non-local constant propagation.

@hez2010 hez2010 force-pushed the loop-test-substitute branch 2 times, most recently from ac22da7 to 091c79f Compare August 16, 2023 15:08
@hez2010 hez2010 closed this Aug 16, 2023
@hez2010 hez2010 reopened this Aug 16, 2023
@hez2010 hez2010 force-pushed the loop-test-substitute branch from 69ffe00 to a5e5fd6 Compare August 16, 2023 16:30
@@ -1870,7 +1870,7 @@ bool Compiler::optIsLoopClonable(unsigned loopInd)
if (requireIterable)
{
if ((loop.lpFlags & LPFLG_CONST_LIMIT) == 0 && (loop.lpFlags & LPFLG_VAR_LIMIT) == 0 &&
(loop.lpFlags & LPFLG_ARRLEN_LIMIT) == 0)
(loop.lpFlags & LPFLG_CONST_VAR_LIMIT) == 0 && (loop.lpFlags & LPFLG_ARRLEN_LIMIT) == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit push the (loop.lpFlags & LPFLG_ARRLEN_LIMIT) == 0) to a newline

@hez2010
Copy link
Contributor Author

hez2010 commented Aug 17, 2023

Made it a bit more general. Now we have more diffs can be observed.
Many size regressions are expected due to more unrolling.

Copy prop test
@JulieLeeMSFT
Copy link
Member

Ping @BruceForstall to review this community PR.

@BruceForstall
Copy link
Member

I think we should not take this PR. It looks like it can find some interesting cases to optimize, but it's not clear how often those occur in practice, and the implementation looks quite expensive to just be feeding loop cloning in this specific case. Also, there is a plan to have a more general assertion propagation occur during morph, which might create a better underpinning for this optimization. See #93246.

@BruceForstall BruceForstall added the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 10, 2023
@hez2010
Copy link
Contributor Author

hez2010 commented Oct 10, 2023

I'm fine with closing this PR if we have other more general solutions that can supersede this one.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 10, 2023
@BruceForstall
Copy link
Member

@hez2010 Thank you for your contribution; I look forward to seeing other contributions that we can take (consider looking at the area-Codegen-coreclr bugs for ideas).

@ghost ghost locked as resolved and limited conversation to collaborators Nov 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants