-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate SimdUnicode for AVX-512 #104199
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-encoding |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@EgorBot -intel -amd using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text.Unicode;
BenchmarkRunner.Run<Bench>(args: args);
public class Bench
{
public static IEnumerable<byte[]> GetUtf8BytesData()
{
// Chinese "Lorem Ipsum"
var utf8 = "唐聞球方五保査禁答近確掲著協世好知長。育乗江校上価話戒宏口自森特室堂討。陸迎奔必秋最量注好枚挑周。間父癒曲在近真権幕覧超持樹件芸保展島船点。齢度約治末価埼坂内辞千故資接藤雨約宿県。定戻業担伸立発告敗家響意球禎。呼真局験善体続得新税知群孫大場。変省創与毎容開拡作北経眺間。樹野市現館開分供同南費海。投以画露両装知全茨済力上速田弘変掲材保内。王野嗅結択芸合験覧託委致就近資。励意親者著識連愚戦親能精球信相準大避一。民覧過走最国転開社加砲者度座図。提著学月牟止百県意能宝質約投分記加。中長塚相選暇版経田経問下訟全報府。要事集細両体要特義点必周優載治山集摘。手機掛果題銀料新政庁分堀住画禁信。味表柄読必望著後入協攻末源安 案志検江水口宿言京並属需就一生断導。通崎楽大最放新属健戦維議本金部兜素定市船"u8.ToArray();
yield return utf8.AsSpan(0, 1000).ToArray();
yield return utf8.AsSpan(0, 500).ToArray();
yield return utf8.AsSpan(0, 250).ToArray();
yield return utf8.AsSpan(0, 100).ToArray();
}
[Benchmark]
[ArgumentsSource(nameof(GetUtf8BytesData))]
public int GetUtf8Bytes(byte[] str) => Encoding.UTF8.GetCharCount(str);
public static IEnumerable<byte[]> ValidateUtf8Data()
{
// ru-RU "Lorem Ipsum"
var utf8 = "Лорем ипсум долор сит амет, хас тале феугаит ех, мел дицит сонет сцрипта ид? Еррорибус темпорибус адверсариум про те, видит ностер хас не, яуод феугаит цу ест. Но дицунт рецусабо диссентиас цум, оптион евертитур ан вих. Но мел антиопам молестиае, продессет абхорреант витуператорибус ат сит, дицант глориатур персецути при еу. При еяуидем пхаедрум рецусабо ех, не вим ерант вертерем Ехерци семпер те нец. Ид нолуиссе детерруиссет нам, яуо ан адхуц дицит пертинациа, мел тота цлита цомпрехенсам ид? Ид аугуе граецис еффициенди вис, ат анимал фиерент инструцтиор пер, не виде еффициенди при!"u8.ToArray();
yield return utf8.AsSpan(0, 1000).ToArray();
yield return utf8.AsSpan(0, 500).ToArray();
yield return utf8.AsSpan(0, 250).ToArray();
yield return utf8.AsSpan(0, 100).ToArray();
}
[Benchmark]
[ArgumentsSource(nameof(ValidateUtf8Data))]
public bool ValidateUtf8(byte[] str) => Utf8.IsValid(str);
} |
Benchmark results on Intel
|
Benchmark results on Amd
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to adapt as needed! :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this better for the registers?
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EgorBo another try 😉
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Günther Foidl <[email protected]>
Question regarding the PR title: it seems using AVX2 (256), not AVX-512 |
@huoyaoyuan I think that's a good point. This does seem to be AVX2 (which is not a bad idea). |
Contributes to #103781, only for AVX-512, other ISAs can be added if/once this is approved/merged.
I did some clean up, like replacing some SIMD apis with cross-platform ones/operators. Btw, I don't believe that
ISimdVector
can be used here. Also, I removed the initial "skip ASCII data" part since we already have a work horse for that.cc @lemire, @Nick-Nuon let me know if you want to change something (including credits in THIRD-PARTY-NOTICES.TXT)
TODO: do some ad-hoc testing, make sure test coverage is good