-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Ascii (and Utf8) encoding #85266
Conversation
* from 10 /17 to 1 instruction for 64/32 bit x86
…ToAscii_Intrinsified
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsReduce overhead of NarrowUtf16ToAscii when Vector128 is hardware accelerated.
|
What about smaller size (which are more popular)? We're seeing more and more problems because of AggressiveInlining attribute placed on large methods (like yours) in the inliner |
src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs
Outdated
Show resolved
Hide resolved
I did not see any regression for smaller inputs, but maybe an unexptected improvement.
BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1555/22H2/2022Update/SunValley2)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.100-preview.2.23157.25
[Host] : .NET 8.0.0 (8.0.23.12803), X64 RyuJIT AVX2
Job-BXGTAG : .NET 8.0.0 (8.0.23.12803), X64 RyuJIT AVX2
MaxRelativeError=0.01 IterationTime=300.0000 ms WarmupCount=1
|
{ | ||
// Below code translates to a single write on x86 (for both 32 and 64 bit) | ||
// - we use double instead of long so that the JIT writes directly to memory without intermediate (register or stack in case of 32 bit) | ||
Unsafe.WriteUnaligned<double>(ref Unsafe.Add(ref bytePtr, elementOffset), byteVector.AsDouble().ToScalar()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I don't think that comment is needed because it is also expected to be a single instruction on arm. Also I'm not sure double is any better than long here, jit is expected to do the right thing anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can believe that the double hack helps on 32-bit x86. We do not pay as much attention to the codegen quality on 32-bit x86 and there are definitely issues. I do not think it is worth it to be adding workarounds like this for x86 quality issues to CoreLib. If issues like this one are important to fix, it would be better to fix it in the JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed the fact that the same code with long
generates bad codegen on x86, I hoped JIT would emit the same as for double 🙁 @jkotas do you mean the whole helper call is not needed? (it is needed because the original code was using Vector64 that we don't recommend using on x86/64) or you're fine with changing this to long
and remove notes? (I agree that we'd better fix this in JIT for 32bit we likely already a few similar patterns in BCL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas do you mean the whole helper call is not needed?
I assume that the helper call is needed to avoid Vector64
that does not produce good code on x64. Without the helper call., the alternative sequence that avoids Vector64
would have to be manually inlined in every place.
@Daniel-Svensson can you please do the same for https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Text/Ascii.CaseConversion.cs#LL532C46-L532C46 ? (you can either inline Unsafe.WriteUnaligned there or move your helper to Vector128 as an internal API) |
I moved the helper to Vector128 and called it from there. I also found an old unused helper (0) references with the same purpose in that file, do you want me to remove that method or update it to call the new helper ? runtime/src/libraries/System.Private.CoreLib/src/System/Text/Ascii.CaseConversion.cs Lines 466 to 499 in 5f0d620
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive optimization @Daniel-Svensson !
I also found an old unused helper (0) references with the same purpose in that file,
Please just remove the unused helper.
src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs
Outdated
Show resolved
Hide resolved
…lity.cs Co-authored-by: Adam Sitnik <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, again thank you for your contribution @Daniel-Svensson !
Reduce overhead of NarrowUtf16ToAscii when Vector128 is hardware accelerated.
For 33 char all ascii case the speedup is roughly
Use a single
movsd
instruction instead of a long series of 10-16 instruction which stores temp on the stack.