-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SearchValues<string> #88394
Add SearchValues<string> #88394
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @dotnet/area-system-buffers Issue DetailsCloses #85573 As discussed during API review, only This is an initial version of
These variants have additional generic specializations for different case sensitivity semantics of values: There are still some less-critical TODOs left, and the code could use more comments. Opening this early in case we want to make any more significant changes. I'll share actual perf numbers when we're happier with the overall shape since there are a lot of interesting needle+haystack combinations to look at. cc: @stephentoub @teo-tsirpanis @danmoseley
|
src/libraries/System.Private.CoreLib/src/Resources/Strings.resx
Outdated
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/AsciiStringSearchValuesTeddyBase.cs
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/AsciiStringSearchValuesTeddyBase.cs
Outdated
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/AsciiStringSearchValuesTeddyBase.cs
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/AsciiStringSearchValuesTeddyBase.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/AhoCorasick.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/AhoCorasick.cs
Show resolved
Hide resolved
...ries/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/EightPackedReferences.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/StringSearchValues.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/StringSearchValues.cs
Show resolved
Hide resolved
do we need anything added to the THIRD-PARTY-NOTICES.txt such as something for http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd or existing implementations like https://github.com/jneem/teddy if they were particularly helpful? |
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/AhoCorasick.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/AhoCorasickNode.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/AhoCorasickNode.cs
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/CharacterFrequencyHelper.cs
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/StringSearchValuesHelper.cs
Outdated
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/StringSearchValuesHelper.cs
Outdated
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/CharacterFrequencyHelper.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/StringSearchValues.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs
Outdated
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/AsciiStringSearchValuesTeddyBase.cs
Outdated
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/AsciiStringSearchValuesTeddyBase.cs
Outdated
Show resolved
Hide resolved
If we don't have anything for http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd yet, then yes, we should add that as we're already using it in our string IndexOf implementations. It would also be fair to add a mention of https://github.com/BurntSushi/aho-corasick as I've spent a lot of time digging through its code initially to understand how the Teddy algorithm works. |
This comment was marked as resolved.
This comment was marked as resolved.
kudos to @BurntSushi here. |
src/libraries/System.Private.CoreLib/src/System/Globalization/TextInfo.cs
Outdated
Show resolved
Hide resolved
....Private.CoreLib/src/System/SearchValues/Strings/AsciiStringSearchValuesTeddyBucketizedN2.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/AhoCorasick.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/AhoCorasick.cs
Show resolved
Hide resolved
...ries/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/EightPackedReferences.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/RabinKarp.cs
Outdated
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/StringSearchValuesHelper.cs
Outdated
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/StringSearchValuesHelper.cs
Outdated
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/SearchValues/Strings/Helpers/StringSearchValuesHelper.cs
Outdated
Show resolved
Hide resolved
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public static Vector512<byte> TransformInput(Vector512<byte> input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding, this PR feels like a really good candidate for exploring ways of reducing duplication between Vector128/256/512.
e8f9307
to
419a352
Compare
977311e
to
6c64e45
Compare
Closes #85573
Closes #86528
Contributes to #62447
As discussed during API review, only
Ordinal
andOrdinalIgnoreCase
semantics are available.Supports
OrdinalIgnoreCase
fully (including running under Invariant/NLS mode).Includes Avx512 variants for vectorized paths.
This is an initial version of
SearchValues<string>
, which includes the following specialized implementations:Teddy
-based approach for searching for length=2 or 3 prefixes. Includes a bucketized variant when dealing with more than 8 values.Rabin-Karp
-based approach that is only used as a fallback forTeddy
for very short inputs.Aho-Corasick
-based approach that we use when dealing with many values (e.g. searching for a blocklist of words in a text), or as a fallback when we can't use Teddy for other reasons.SingleStringSearchValuesThreeChars
is a variant of http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd that uses 3 precomputed anchor points. On average provides lower per-call overheads than callingtext.IndexOf("foo")
.SingleStringSearchValuesFallback
for when we don't have anything fancier available and falling back to Aho-Corasick would be noticeably slower.These variants have additional generic specializations for different case sensitivity semantics of values:
CaseSensitive
,CaseInsensitiveAsciiLetters
,CaseInsensitiveAscii
, andCaseInsensitiveUnicode
.I'll share actual perf numbers when we're happier with the overall shape since there are a lot of interesting needle+haystack combinations to look at.
cc: @stephentoub @teo-tsirpanis @danmoseley