Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT generates better code when expression is stored in variable #102273

Closed
oleg-st opened this issue May 15, 2024 · 3 comments · Fixed by #102808
Closed

JIT generates better code when expression is stored in variable #102273

oleg-st opened this issue May 15, 2024 · 3 comments · Fixed by #102808
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:2 Work that is important, but not critical for the release tenet-performance Performance related issue
Milestone

Comments

@oleg-st
Copy link

oleg-st commented May 15, 2024

Description

Code:

using System.Runtime.CompilerServices;
using System.Numerics;

public unsafe class C {
    
    public unsafe struct BIT_DStream_t
    {
        public nuint bitContainer;
        public uint bitsConsumed;
        public sbyte* ptr;
        public sbyte* start;
        public sbyte* limitPtr;
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private static nuint BIT_initDStream1(ref BIT_DStream_t bitD, void* srcBuffer, nuint srcSize)
    {
        byte lastByte = ((byte*)srcBuffer)[srcSize - 1];
        bitD.bitsConsumed = lastByte != 0 ? 8 - (uint)BitOperations.Log2(lastByte) : 0;
        
        return 0;
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private static nuint BIT_initDStream2(ref BIT_DStream_t bitD, void* srcBuffer, nuint srcSize)
    {
        byte lastByte = ((byte*)srcBuffer)[srcSize - 1];
        // difference here
        var v = lastByte != 0 ? 8 - (uint)BitOperations.Log2(lastByte) : 0;
        bitD.bitsConsumed = v;
        
        return 0;
    }
    
    private static nuint M1(byte *ip, byte *iend) {
        BIT_DStream_t seqState_DStream = new ();
        BIT_initDStream1(ref seqState_DStream, ip, (nuint)(iend - ip));
        return seqState_DStream.bitContainer;
    }
    
    private static nuint M2(byte *ip, byte *iend) {
        BIT_DStream_t seqState_DStream = new ();
        BIT_initDStream2(ref seqState_DStream, ip, (nuint)(iend - ip));
        return seqState_DStream.bitContainer;
    }
}

M1:

    L0000: sub rsp, 0x28
    L0004: vxorps xmm4, xmm4, xmm4
    L0008: vmovdqa [rsp], xmm4
    L000d: vmovdqa [rsp+0x10], xmm4
    L0013: xor eax, eax
    L0015: mov [rsp+0x20], rax
    L001a: sub rdx, rcx
    L001d: movzx eax, byte ptr [rdx+rcx-1]
    L0022: lea rcx, [rsp]
    L0026: test eax, eax
    L0028: jne short L002e
    L002a: xor edx, edx
    L002c: jmp short L003f
    L002e: or eax, 1
    L0031: xor edx, edx
    L0033: lzcnt edx, eax
    L0037: xor edx, 0x1f
    L003a: neg edx
    L003c: add edx, 8
    L003f: mov [rcx+8], edx
    L0042: mov rax, [rsp]
    L0046: add rsp, 0x28
    L004a: ret

M2:

    L0000: sub rdx, rcx
    L0003: movsx rax, byte ptr [rdx+rcx-1]
    L0009: xor eax, eax
    L000b: ret

SharpLab

Configuration

.NET 8.0.204

@oleg-st oleg-st added the tenet-performance Performance related issue label May 15, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 15, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 15, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member

EgorBo commented May 15, 2024

Sounds like #87072 cc @jakobbotsch @dotnet/jit-contrib

@jakobbotsch
Copy link
Member

Yeah looks like it. The good news is that it is in my plans for .NET 9 to fix this.

BTW, also looks like we are missing a simple transform in morph for the M2 case:

fgMorphTree BB01, STMT00004 (after)
               [000026] DA-XG+-----                           STORE_LCL_VAR int    V05 tmp2         
               [000025] ---XG+-----                         └──▌  IND       ubyte 
               [000024] -----+-----                            └──▌  ADD       long  
               [000023] -----+-----                               ├──▌  ADD       long  
               [000004] -----+-----                                 ├──▌  LCL_VAR   long   V00 arg0         
               [000007] -----+-----                                 └──▌  SUB       long  
               [000005] -----+-----                                    ├──▌  LCL_VAR   long   V01 arg1          (last use)
               [000006] -----+-----                                    └──▌  LCL_VAR   long   V00 arg0         
               [000022] -----+-----                               └──▌  CNS_INT   long   -1

(x + (y - x)) => y

@jakobbotsch jakobbotsch self-assigned this May 15, 2024
@jakobbotsch jakobbotsch added this to the 9.0.0 milestone May 15, 2024
@jakobbotsch jakobbotsch removed the untriaged New issue has not been triaged by the area owner label May 15, 2024
@jakobbotsch jakobbotsch added the Priority:2 Work that is important, but not critical for the release label May 29, 2024
jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue May 29, 2024
This changes local morph to run in RPO when optimizations are enabled.
It adds infrastructure to track and propagate LCL_ADDR values assigned
to locals (or struct fields) during local morph. This allows us to avoid
address exposure in cases where the destination local does not actually
end up escaping in any way.

Example:
```csharp
public struct Awaitable
{
    public int Opts;

    public Awaitable(bool value)
    {
        Opts = value ? 1 : 2;
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static int Test() => new Awaitable(false).Opts;
```

Before:
```asm
G_M59043_IG01:  ;; offset=0x0000
       push     rax
						;; size=1 bbWeight=1 PerfScore 1.00

G_M59043_IG02:  ;; offset=0x0001
       xor      eax, eax
       mov      dword ptr [rsp], eax
       mov      dword ptr [rsp], 2
       mov      eax, dword ptr [rsp]
						;; size=15 bbWeight=1 PerfScore 3.25

G_M59043_IG03:  ;; offset=0x0010
       add      rsp, 8
       ret
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 21

```

After:
```asm
G_M59043_IG02:  ;; offset=0x0000
       mov      eax, 2
						;; size=5 bbWeight=1 PerfScore 0.25

G_M59043_IG03:  ;; offset=0x0005
       ret
```

Propagating the addresses works much like local assertion prop in morph
does. Proving that the locations that were stored to do not escape
afterwards is done with a simplistic approach: we check globally that no
reads of the location exists, and if so, we replace the `LCL_ADDR`
stored to them by a constant 0. We leave it up to liveness to clean up
the stores themselves.

This could be more sophisticated, but in practice this handles the
reported cases just fine.

If we were able to remove any `LCL_ADDR` in this way then we run an
additional pass over the locals of the IR to compute the final set of
exposed locals.

Fix dotnet#87072
Fix dotnet#102273
Fix dotnet#102518

This is still not sufficient to handle dotnet#69254. To handle that case we
need to handle transferring assertions for struct copies, and also
handle proving that specific struct fields containing local addresses do
not escape. It is probably doable, but for now I will leave it as future
work.
@github-actions github-actions bot locked and limited conversation to collaborators Jul 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:2 Work that is important, but not critical for the release tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants