Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slightly improve string unrolling for length 5 and 6 #77398

Merged
merged 15 commits into from
Oct 27, 2022

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Oct 24, 2022

Use 32bit load for the 2nd indirect load for length 5 and 6, it produces slightly better codegen, e.g.:

bool IsFalse(string s) => s.Equals("false", StringComparison.OrdinalIgnoreCase);
// same for Spans, StartsWith, etc.

Codegen diff:

; Assembly listing for method Program:IsFalse(System.String):bool:this
G_M56292_IG01:              
G_M56292_IG02:              
       837A0805             cmp      dword ptr [rdx+08H], 5
       7541                 jne      SHORT G_M56292_IG04
G_M56292_IG03:              
       48B82000200020002000 mov      rax, 0x20002000200020
       480B420C             or       rax, qword ptr [rdx+0CH]
       48B9660061006C007300 mov      rcx, 0x73006C00610066
       4833C1               xor      rax, rcx
-      48B92000200020002000 mov      rcx, 0x20002000200020
-      480B4A0E             or       rcx, qword ptr [rdx+0EH]
-      48BA61006C0073006500 mov      rdx, 0x650073006C0061
-      4833D1               xor      rdx, rcx
+      8B5212               mov      edx, dword ptr [rdx+12H]
+      81CA20002000         or       edx, 0x200020
+      81F273006500         xor      edx, 0x650073
       480BC2               or       rax, rdx
       0F94C0               sete     al
       0FB6C0               movzx    rax, al
       EB02                 jmp      SHORT G_M56292_IG05
G_M56292_IG04:              
       33C0                 xor      eax, eax
G_M56292_IG05:              
       C3                   ret      
-; Total bytes of code 74
+; Total bytes of code 62

Btw, it'd be nice if we could get rid of this in favor of ^ but it will regress mono.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 24, 2022
@ghost ghost assigned EgorBo Oct 24, 2022
@ghost
Copy link

ghost commented Oct 24, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Use 32bit load for the 2nd indirect load for length 5 and 6, it produces slightly better codegen, e.g.:

bool IsFalse(string s) => s.Equals("false", StringComparison.OrdinalIgnoreCase);

Codegen diff:

; Assembly listing for method Program:IsFalse(System.String):bool:this
G_M56292_IG01:              
G_M56292_IG02:              
       837A0805             cmp      dword ptr [rdx+08H], 5
       7541                 jne      SHORT G_M56292_IG04
G_M56292_IG03:              
       48B82000200020002000 mov      rax, 0x20002000200020
       480B420C             or       rax, qword ptr [rdx+0CH]
       48B9660061006C007300 mov      rcx, 0x73006C00610066
       4833C1               xor      rax, rcx
-      48B92000200020002000 mov      rcx, 0x20002000200020
-      480B4A0E             or       rcx, qword ptr [rdx+0EH]
-      48BA61006C0073006500 mov      rdx, 0x650073006C0061
-      4833D1               xor      rdx, rcx
+      8B5212               mov      edx, dword ptr [rdx+12H]
+      81CA20002000         or       edx, 0x200020
+      81F273006500         xor      edx, 0x650073
       480BC2               or       rax, rdx
       0F94C0               sete     al
       0FB6C0               movzx    rax, al
       EB02                 jmp      SHORT G_M56292_IG05
G_M56292_IG04:              
       33C0                 xor      eax, eax
G_M56292_IG05:              
       C3                   ret      
-; Total bytes of code 74
+; Total bytes of code 62

Btw, it'd be nice if we could get rid of this in favor of ^ but it will regress mono.

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo force-pushed the improve-str-unroll branch from 927a264 to c3cd558 Compare October 24, 2022 19:19
@EgorBo
Copy link
Member Author

EgorBo commented Oct 24, 2022

Although, the JIT version seems faster 🙂:

internal static bool IsFalseStringIgnoreCase_jit(ReadOnlySpan<char> value) => 
    value.Equals("false", StringComparison.OrdinalIgnoreCase);

internal static bool IsFalseStringIgnoreCase(ReadOnlySpan<char> value)
{
    // "fals" as a ulong, each char |'d with 0x0020 for case-insensitivity
    ulong fals_val = BitConverter.IsLittleEndian ? 0x73006C00610066ul : 0x660061006C0073ul;
    return value.Length == 5 &&
           (((MemoryMarshal.Read<ulong>(MemoryMarshal.AsBytes(value)) | 0x0020002000200020) == fals_val) &
            ((value[4] | 0x20) == 'e'));
}

codegen diff: https://www.diffchecker.com/AodEOtoa (jit one is on the left)

@jkotas
Copy link
Member

jkotas commented Oct 24, 2022

Although, the JIT version seems faster

Change the CoreLib impl to this then?

@EgorBo
Copy link
Member Author

EgorBo commented Oct 24, 2022

Although, the JIT version seems faster

Change the CoreLib impl to this then?

Do we need to keep the hand-written version for mono? I feel like it should also be doable for Mono-LLVM without a lot of efforts - we can emit llvm.memcmp intrinsic here and LLVM will unroll it itself (not the mono-mini and mono-interp though)

@stephentoub
Copy link
Member

Honest question, does this micro-optimization for bool.TryParse matter as much on mono? There are plenty of places where we use string.Equals(..., StringComparison.OrdinalIgnoreCase) everywhere and we don't try to special-case mono. I'd opt for simplicity, and if this particular optimization is important for mono, we should focus on imbuing it with the same optimization coreclr has.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 24, 2022

I think we can drop it and restore if they/benchmarks scream at us 😄 I might try to implement the optimization for mono-llvm when I have time

@jkotas
Copy link
Member

jkotas commented Oct 24, 2022

Honest question, does this micro-optimization for bool.TryParse matter as much on mono? There are plenty of places where we use string.Equals(..., StringComparison.OrdinalIgnoreCase) everywhere and we don't try to special-case mono

+1 . We have avoided introducing Mono-specific performance ifdefs for the most part and it served us well.

@@ -15,7 +15,7 @@ internal static bool GetBooleanConfig(string switchName, string envVariable, boo
if (!AppContext.TryGetSwitch(switchName, out bool ret))
{
string? switchValue = Environment.GetEnvironmentVariable(envVariable);
ret = switchValue != null ? (bool.IsTrueStringIgnoreCase(switchValue) || switchValue.Equals("1")) : defaultValue;
ret = switchValue != null ? (switchValue.Equals("true", StringComparison.OrdinalIgnoreCase) || switchValue.Equals("1")) : defaultValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is dangerous to call string.Equals(..., StringComparison.OrdinalIgnoreCase) here. It can call into globalization stack when the JIT optimizations does not kick in, and the globalization will call this method recursively to read its config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good point, I see that Ordinal.EqualsIgnoreCase has a fast path for ASCII but if the value in the config is not ASCII it might trigger globalization indeed, let me restore the previous version here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'm sad I was not smart enough to see it myself 😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be also possible to fix this by changing OrdinalIgnoreCase to avoid initializing globalization.

OrdinalIgnoreCase will fallback to the slow path if one of the chars is not ASCII. It really only needs to fallback to the slow path if both chars are not ASCII.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas do you mind if I investigate it separately? I also wanted to look into vectorizing Ordinal.CompareStringIgnoreCase - it currently uses only SWAR version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mono tests are still failing though. You may need to fix this to make them pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, changed CompareStringIgnoreCase

@stephentoub
Copy link
Member

codegen diff: https://www.diffchecker.com/AodEOtoa (jit one is on the left)

Is it possible to hand-roll an implementation that results in as-good code, or is there something else missing in the JIT that would enable that? (I'm not suggesting we want to hand-roll this, but rather wondering if there's an additional gap to be addressed.)

@benaadams
Copy link
Member

codegen diff: https://www.diffchecker.com/AodEOtoa (jit one is on the left)

Is it possible to hand-roll an implementation that results in as-good code, or is there something else missing in the JIT that would enable that? (I'm not suggesting we want to hand-roll this, but rather wondering if there's an additional gap to be addressed.)

Are two throws in asm for the non Jit version:

  • ArgumentOutOfRangeException in the MemoryMarshal.Read<ulong> (checking length)
  • OverflowException for the MemoryMarshal.AsBytes (computing new length checked(span.Length * Unsafe.SizeOf<T>()))

@EgorBo
Copy link
Member Author

EgorBo commented Oct 25, 2022

Are two throws in asm for the non Jit version:

  • ArgumentOutOfRangeException in the MemoryMarshal.Read<ulong> (checking length)
  • OverflowException for the MemoryMarshal.AsBytes (computing new length checked(span.Length * Unsafe.SizeOf<T>()))

Yep, OverflowException is a known case, if you look at the asm you will see that it's unreachable, it's just JIT is not able to remove it. We plan to eventually handle that, it's just low priority because it doesn't affect hot code. Same problem exists on arm64 with div.

ArgumentOutOfRangeException looks complicated to eliminate in JIT here 😞 but we can try.

Is it possible to hand-roll an implementation that results in as-good code,

this version is closer to what jit emits (modulo unaligned reads):

internal static bool IsFalseStringIgnoreCase(ReadOnlySpan<char> value)
{
    if (value.Length == 5)
    {
        ref char c = ref MemoryMarshal.GetReference(value);
        ulong fals_val = BitConverter.IsLittleEndian ? 0x73006C00610066ul : 0x660061006C0073ul;
        return (Unsafe.As<char, ulong>(ref c) | 0x0020002000200020 ^ fals_val | (value[4] | 0x20 ^ (uint)'e')) == 0;
    }
    return false;
}

@EgorBo
Copy link
Member Author

EgorBo commented Oct 25, 2022

@EgorBo
Copy link
Member Author

EgorBo commented Oct 26, 2022

@jkotas @stephentoub it seems it still fails on Mono, it happens because mono has a slightly different behavior around static initialization order. This path https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.Globalization.cs#L54 doesn't trigger CompareInfo..cctor() on CoreCLR (because it's unused from Boolean.cs), but it does on mono, likely just hoisted.

@jkotas
Copy link
Member

jkotas commented Oct 26, 2022

CoreCLR will trigger the beforefieldinit static constructors exactly at the point of the first field reference. It is done intentionally to avoid unpredictable behavior and compat problems (#4346 (comment)).

I thought that Mono is on the same plan, but it is apparently not the case. Open a bug on this against Mono and add a workaround under MONO ifdef and a link to the bug?

@EgorBo
Copy link
Member Author

EgorBo commented Oct 26, 2022

Open a bug on this against Mono

#77513

@build-analysis build-analysis bot mentioned this pull request Oct 27, 2022
2 tasks
@EgorBo EgorBo merged commit 771eca7 into dotnet:main Oct 27, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants