-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slightly improve string unrolling for length 5 and 6 #77398
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsUse 32bit load for the 2nd indirect load for length 5 and 6, it produces slightly better codegen, e.g.: bool IsFalse(string s) => s.Equals("false", StringComparison.OrdinalIgnoreCase); Codegen diff: ; Assembly listing for method Program:IsFalse(System.String):bool:this
G_M56292_IG01:
G_M56292_IG02:
837A0805 cmp dword ptr [rdx+08H], 5
7541 jne SHORT G_M56292_IG04
G_M56292_IG03:
48B82000200020002000 mov rax, 0x20002000200020
480B420C or rax, qword ptr [rdx+0CH]
48B9660061006C007300 mov rcx, 0x73006C00610066
4833C1 xor rax, rcx
- 48B92000200020002000 mov rcx, 0x20002000200020
- 480B4A0E or rcx, qword ptr [rdx+0EH]
- 48BA61006C0073006500 mov rdx, 0x650073006C0061
- 4833D1 xor rdx, rcx
+ 8B5212 mov edx, dword ptr [rdx+12H]
+ 81CA20002000 or edx, 0x200020
+ 81F273006500 xor edx, 0x650073
480BC2 or rax, rdx
0F94C0 sete al
0FB6C0 movzx rax, al
EB02 jmp SHORT G_M56292_IG05
G_M56292_IG04:
33C0 xor eax, eax
G_M56292_IG05:
C3 ret
-; Total bytes of code 74
+; Total bytes of code 62 Btw, it'd be nice if we could get rid of this in favor of ^ but it will regress mono.
|
927a264
to
c3cd558
Compare
Although, the JIT version seems faster 🙂: internal static bool IsFalseStringIgnoreCase_jit(ReadOnlySpan<char> value) =>
value.Equals("false", StringComparison.OrdinalIgnoreCase);
internal static bool IsFalseStringIgnoreCase(ReadOnlySpan<char> value)
{
// "fals" as a ulong, each char |'d with 0x0020 for case-insensitivity
ulong fals_val = BitConverter.IsLittleEndian ? 0x73006C00610066ul : 0x660061006C0073ul;
return value.Length == 5 &&
(((MemoryMarshal.Read<ulong>(MemoryMarshal.AsBytes(value)) | 0x0020002000200020) == fals_val) &
((value[4] | 0x20) == 'e'));
} codegen diff: https://www.diffchecker.com/AodEOtoa (jit one is on the left) |
Change the CoreLib impl to this then? |
Do we need to keep the hand-written version for mono? I feel like it should also be doable for Mono-LLVM without a lot of efforts - we can emit |
Honest question, does this micro-optimization for bool.TryParse matter as much on mono? There are plenty of places where we use |
I think we can drop it and restore if they/benchmarks scream at us 😄 I might try to implement the optimization for mono-llvm when I have time |
+1 . We have avoided introducing Mono-specific performance ifdefs for the most part and it served us well. |
@@ -15,7 +15,7 @@ internal static bool GetBooleanConfig(string switchName, string envVariable, boo | |||
if (!AppContext.TryGetSwitch(switchName, out bool ret)) | |||
{ | |||
string? switchValue = Environment.GetEnvironmentVariable(envVariable); | |||
ret = switchValue != null ? (bool.IsTrueStringIgnoreCase(switchValue) || switchValue.Equals("1")) : defaultValue; | |||
ret = switchValue != null ? (switchValue.Equals("true", StringComparison.OrdinalIgnoreCase) || switchValue.Equals("1")) : defaultValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is dangerous to call string.Equals(..., StringComparison.OrdinalIgnoreCase)
here. It can call into globalization stack when the JIT optimizations does not kick in, and the globalization will call this method recursively to read its config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good point, I see that Ordinal.EqualsIgnoreCase
has a fast path for ASCII but if the value in the config is not ASCII it might trigger globalization indeed, let me restore the previous version here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'm sad I was not smart enough to see it myself 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be also possible to fix this by changing OrdinalIgnoreCase to avoid initializing globalization.
OrdinalIgnoreCase will fallback to the slow path if one of the chars is not ASCII. It really only needs to fallback to the slow path if both chars are not ASCII.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas do you mind if I investigate it separately? I also wanted to look into vectorizing Ordinal.CompareStringIgnoreCase
- it currently uses only SWAR version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mono tests are still failing though. You may need to fix this to make them pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, changed CompareStringIgnoreCase
src/libraries/System.Private.CoreLib/src/System/AppContextConfigHelper.cs
Outdated
Show resolved
Hide resolved
Is it possible to hand-roll an implementation that results in as-good code, or is there something else missing in the JIT that would enable that? (I'm not suggesting we want to hand-roll this, but rather wondering if there's an additional gap to be addressed.) |
Are two throws in asm for the non Jit version:
|
Yep,
this version is closer to what jit emits (modulo unaligned reads): internal static bool IsFalseStringIgnoreCase(ReadOnlySpan<char> value)
{
if (value.Length == 5)
{
ref char c = ref MemoryMarshal.GetReference(value);
ulong fals_val = BitConverter.IsLittleEndian ? 0x73006C00610066ul : 0x660061006C0073ul;
return (Unsafe.As<char, ulong>(ref c) | 0x0020002000200020 ^ fals_val | (value[4] | 0x20 ^ (uint)'e')) == 0;
}
return false;
} |
src/libraries/System.Private.CoreLib/src/System/Globalization/Ordinal.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/Ordinal.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Jan Kotas <[email protected]>
@jkotas @stephentoub it seems it still fails on Mono, it happens because mono has a slightly different behavior around static initialization order. This path https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.Globalization.cs#L54 doesn't trigger |
CoreCLR will trigger the beforefieldinit static constructors exactly at the point of the first field reference. It is done intentionally to avoid unpredictable behavior and compat problems (#4346 (comment)). I thought that Mono is on the same plan, but it is apparently not the case. Open a bug on this against Mono and add a workaround under MONO ifdef and a link to the bug? |
|
Use 32bit load for the 2nd indirect load for length 5 and 6, it produces slightly better codegen, e.g.:
Codegen diff:
Btw, it'd be nice if we could get rid of this in favor of ^ but it will regress mono.