-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: IndexOfAnyValues<string>
#85573
Comments
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsBackground and motivationEarlier in .NET 8, we added an We've already exposed As we're working with While a
cc: @stephentoub API Proposalnamespace System.Buffers;
public static class IndexOfAnyValues
{
// Existing
public static IndexOfAnyValues<byte> Create(ReadOnlySpan<byte> values);
public static IndexOfAnyValues<char> Create(ReadOnlySpan<char> values);
// Proposed
public static IndexOfAnyValues<string> Create(ReadOnlySpan<string> values, StringComparison comparisonType);
} namespace System;
public static class MemoryExtensions
{
// Existing
public static int IndexOfAny<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
public static int IndexOfAnyExcept<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
public static int LastIndexOfAny<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
public static int LastIndexOfAnyExcept<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
// Proposed
public static int IndexOfAny(this ReadOnlySpan<char> span, IndexOfAnyValues<string> values);
} API Usageprivate static readonly IndexOfAnyValues<string> s_names = IndexOfAnyValues.Create(
new[] { "Sherlock", "Holmes", "Watson" }, StringComparison.Ordinal);
public static int CountNames(ReadOnlySpan<char> text)
{
int count = 0;
while (!text.IsEmpty)
{
int matchOffset = text.IndexOfAny(s_names);
if ((uint)matchOffset >= (uint)text.Length) break;
int matchLength = text[matchOffset] == 'S' ? 8 : 6;
text = text.Slice(matchOffset + matchLength);
count++;
}
return count;
} Alternative DesignsEmphasize that only public static IndexOfAnyValues<string> CreateOrdinal(ReadOnlySpan<string> values, StringComparison comparisonType);
// or
public static IndexOfAnyValues<string> CreateOrdinal(ReadOnlySpan<string> values, bool ignoreCase); RisksNo response
|
Would it also make sense to extend this to arrays in the future? For example, if you wanted to search for UTF-8 substrings: private static readonly IndexOfAnyValues<byte[]> s_names = IndexOfAnyValues.Create(
new[] { "Sherlock"u8.ToArray(), "Holmes"u8.ToArray(), "Watson"u8.ToArray() });
public static int CountNames(ReadOnlySpan<byte> text)
{
int count = 0;
while (!text.IsEmpty)
{
int matchOffset = text.IndexOfAny(s_names);
if ((uint)matchOffset >= (uint)text.Length) break;
int matchLength = text[matchOffset] == 'S' ? 8 : 6;
text = text.Slice(matchOffset + matchLength);
count++;
}
return count;
} Maybe |
It's possible. I think we should gain some experience with the string versions first, though. Strings are also the variant we'll immediately use elsewhere in the core libraries (Miha and I have an end-to-end implementation with regex). |
Your proposal is another example against the closed nature of the current implementation. I did mention some in my proposal of Also it is not clear to me what is the expected behavior of the consumers of those classes. How are they going to be used? Are they invoking the Finally a few thoughts on your proposal
|
Are you referring to
If you want to check things like "does a text contain a forbidden character", you would use For checks like "does a text contain any of these forbidden words", you could use For "general searching" (like Regex), they can use the
I don't. Note that
The set of values you specified in
It's an internal method -- an implementation detail of the debug view.
Whole value matching is the only thing it supports. It doesn't support other prefix-tree-like operations.
If you ask for |
I mean how do you proceed after that. The first call returns 0. How do you call it to find the next. You only have an index with a value of 0. You don't know that a word with a length of 7 was matched.
Then it is not an API. Given that I do not know why the IndexOfAnyValues family is a public one and not an internal in the first place. |
We had a long discussion about the implications of just returning the Looks good as proposed. namespace System.Buffers;
public static partial class IndexOfAnyValues
{
public static IndexOfAnyValues<string> Create(ReadOnlySpan<string> values, StringComparison comparisonType);
}
namespace System;
public static partial class MemoryExtensions
{
public static int IndexOfAny(this ReadOnlySpan<char> span, IndexOfAnyValues<string> values);
// From https://github.com/dotnet/runtime/issues/86528
public static bool ContainsAny(this ReadOnlySpan<char> span, SearchValues<string> values);
} |
@MihaZupan related, is there/was there a proposal for eg., for use places like this which currently have to use IndexOf. https://github.com/dotnet/msbuild/blob/4ffba3fe0dd35a30cc892bc8c202a006acb8f20a/src/Build/Evaluation/Expander.cs#L401 IIRC, we have some optimizations for Contains that can make it faster than IndexOf. |
If we were going to add something there, it would be more like: public static bool ContainsAny(this ReadOnlySpan<char> span, SearchValues<char> values); Leaving the possibility of adding such methods in the future is why we renamed That is unrelated to this issue, though.
It's relatively minor, though. Basically once we find that there is a match, for Contains we can just return true whereas for IndexOf we need to determine which element in the vector matched in order to return the exact right index. |
For something like |
@danmoseley I opened a separate issue to track that #86528 |
@stephentoub, @MihaZupan; is this critical to land for .NET 8 or is it fine if it slips to .NET 9? |
It's not critical, but it does complete a key feature we targeted for .NET 8, and it's inches from merging (there's also a follow-up PR already reviewed that'll also go in once that merges). |
Background and motivation
Earlier in .NET 8, we added an
IndexOfAnyValues<T>
type that represents an immutable set of values optimized for efficient{Last}IndexOfAny{Except}
searching: #68328 (comment).You obtain an instance of
IndexOfAnyValues
by passing the full set of values toCreate
, which picks the most efficient algorithm for that set of values and capabilities of the current platform.We've already exposed
Create
methods for searching for anybyte
orchar
in a given set.This proposal would allow you to create
IndexOfAnyValues<string>
instances for searching for multiple substrings in a given text, instead of just individual characters.Like with bytes and chars,
Create
forIndexOfAnyValues<string>
can analyze the values in advance and pick the most optimal algorithm (e.g., Aho-Corasick, Rabin-Karp, Teddy, ...).As we're working with
string
s, we're proposing adding aStringComparison
parameter toCreate
.This method would only work for
Ordinal/OrdinalIgnoreCase
. It's unlikely that we can meaningfully accelerate culture-aware multi-substring searching, and the proposed APIs assume that matches are of equal lengths.Semantically,
IndexOfAny
would behave the same as doing anIndexOf
of every value, and taking the minimum index.While a
*Last*IndexOfAny
variant is possible, we are currently only proposing the left-to-rightIndexOfAny
.Regex
is likely to be the main consumer of this API.cc: @stephentoub
cc: @tarekgh, @GrabYourPitchforks as we're working with
string
sAPI Proposal
API Usage
Alternative Designs
Emphasize that only
Ordinal{IgnoreCase}
works:An API that returns both a match offset and length (instead of just the offset).
So far we've rejected this variant as it adds a policy to
IndexOfAny
semantics -- are we returning the leftmost-first vs leftmost-longest match.Related issues: #69682, #62447
The text was updated successfully, but these errors were encountered: