-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: SplitFirst/SplitLast methods for strings and (ReadOnly)Span/Memory<T> slices #75317
Comments
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsBackground and motivationWhile working with various solutions both internal and external, it appears there is a commonly used pattern to split strings just once which can be improved with an allocation-free helper similar to Specifically, such patterns are especially common when processing headers and fields in JSON payloads. var header = "Authorization: Bearer mF_9.B5f-4.1JqM";
var token = header.Split(':')[0].Trim();
// remove 'Bearer' and handle auth While it has acceptable performance, we may be able to do better by providing convenience methods which would allow to handle "split once" cases idiomatically while simultaneously nudging the developers towards performance-oriented APIs. API Proposalnamespace System.Text:
public static class SplitHelpers
{
public static (ReadOnlyMemory<char> Left, ReadOnlyMemory<char> Right) SplitOnce(
this string source, char separator);
public static (ReadOnlyMemory<char> Left, ReadOnlyMemory<char> Right) SplitOnce(
this string source, ReadOnlySpan<char> separator);
public static (ReadOnlyMemory<char> Left, ReadOnlyMemory<char> Right) SplitOnce(
this ReadOnlyMemory<char> source, char separator);
public static (ReadOnlyMemory<char> Left, ReadOnlyMemory<char> Right) SplitOnce(
this ReadOnlyMemory<char> source, ReadOnlySpan<char> separator);
public static ReadOnlySpanSplit<char> SplitOnce(this ReadOnlySpan<char> source, char separator);
public static ReadOnlySpanSplit<char> SplitOnce(
this ReadOnlySpan<char> source, ReadOnlySpan<char> separator);
public static SpanSplit<char> SplitOnce(this Span<char> source, char separator);
public static SpanSplit<char> SplitOnce(this Span<char> source, ReadOnlySpan<char> separator);
// RefValueTuple? :)
public readonly ref struct ReadOnlySpanSplit<T>
{
public readonly ReadOnlySpan<T> Left;
public readonly ReadOnlySpan<T> Right;
...
public void Deconstruct(out ReadOnlySpan<T> left, out ReadOnlySpan<T> right)
{
left = Left;
right = Right;
}
public readonly ref struct SpanSplit<T>
{
public readonly ReadOnlySpan<T> Left;
public readonly ReadOnlySpan<T> Right;
...
public void Deconstruct(out ReadOnlySpan<T> left, out ReadOnlySpan<T> right)
{
left = Left;
right = Right;
}
}
} API Usagevar (name, surname) = "John Doe".SplitOnce(' ');
var messageHeader = "FinalNotification::Suspended\r\n";
var (notificationKind, status) = messageHeader
.AsSpan()
.TrimEnd("\r\n")
.SplitOnce("::"); Header example: var header = "Authorization: Bearer mF_9.B5f-4.1JqM";
// Get token slice out of string. Because split consists of two ROMs,
// the operations are both allocation free and work in scenarios like async middleware
var token = header.SplitOnce(':').Right.Trim(); Alternative DesignsEither keep using currently available methods or rely on hand-written/nuget extensions. RisksBecause
|
Here are the numbers for prototype impl. that can only handle [MemoryDiagnoser]
public class SplitBenchmarks
{
[Params(
"Authorization: test-token",
"longstringlongstringlongstringlongstringlongstring:longstringlongstring")]
public string Value = string.Empty;
[Benchmark(Baseline = true)]
public int Split() => Value.Split(':')[1].Length;
[Benchmark]
public int SplitOnce() => Value.SplitOnce(':').Right.Length;
} BenchmarkDotNet=v0.13.1, OS=macOS 13.0 (22A5331f) [Darwin 22.1.0]
Apple M1 Pro, 1 CPU, 8 logical and 8 physical cores
.NET SDK=7.0.100-rc.2.22452.3
[Host] : .NET 7.0.0 (7.0.22.42212), Arm64 RyuJIT
DefaultJob : .NET 7.0.0 (7.0.22.42212), Arm64 RyuJIT
|
Seems like a large and complicated api surface, considering it can be done with an |
There's also an existing proposal with a lot of discussion and work around splitting support for spans here: |
Thanks, I have seen the linked proposal but thought the above use case is a "low hanging fruit" that can serve one of the most often used |
It just doesn't seem like it saves a lot. Presumably all of these would have to throw an exception if the separator wasn't found (putting everything into one of the two sides seems arbitrary and likely bug-inducing), and I'm not sure that actually maps to the majority use case where this would be used (you have a header parsing example, but I struggle to imagine real-world usage for header parsing that would be ok with the proposed semantics when the separator wasn't found). And even then it's not saving a whole lot, e.g. (ReadOnlySpan<char> firstName, ReadOnlySpan<char> lastName) = name.SplitFirst(' '); instead of: int pos = name.IndexOf(' ');
ReadOnlySpan<char> firstName = name[0..pos];
ReadOnlySpan<char> lastName = name[pos+1..]; where the latter is also more flexible, doesn't require learning about a new method, etc. Personally, it doesn't seem worth it to me. Happy to hear opinions from others, though. |
Agree, I don't have a strong opinion on which is better, throwing an exception or returning the right slice as empty. Implementation-wise, not all code wants EH and the expectation is to just check
|
The API proposal has been updated with the following changes:
|
Is there anything that can be done to move forward with the proposal? The pattern appears to be quite popular: https://grep.app/search?q=%5C.Split%5C%28%27.%28.%29%27%5C%29%5C%5B%28.%29%5C%5D®exp=true&case=true&filter[lang][0]=C%23 |
@dotnet/area-system-memory is there any work or discussion required for this proposal to become ready for review? |
I continue to not see it as being worthwhile. A lot of surface area for something that's already supported in multiple ways. |
Working with strings in advanced implementations in C# is very nice through spans, and such a scenario is well-covered there, yes. However, the average user will not be reaching out for those, and may even be (incorrectly) advised to avoid them. I believe there is surprisingly big gap between an analyzer suggesting to replace It seems not doing so would go against overall trend of C# to offer more APIs to streamline common operations in a performant way. I understand that it is trivial and not worth it for a caliber of developer that works on optimizing .NET, but that's not the case for the average codebase more preoccupied with moving a json from an incoming request to an entity in DB, so the cost accrues. Please close/reject the proposal if you believe that such API does not make sense in .NET. |
FWIW I independently came up with a similar implementation, which I found very useful for working with Spans. So I at least found this API valuable. |
I think this can be valuable if the API is improved to make it easier to check whether the separator was found. For example (using
The latter can be consumed by indexing Although the number of lines saved per use of the API is low, it is common in some kinds of code. It might be enough to have methods for |
As for the "how to handle delimiter not found" case I think returning the entire string in the first span and an empty segment in the second is the most logical.
var (left, right) = span.SplitFirst(delimiter) is (_, not []) split ? split : throw new Exception("Delimiter Not Found"); |
@jilles-sg funnily enough, the indexer segment access (with deconstruction alternative) is what I settled on in the latest iteration of internal helpers to deal with this too. Works nicely whenever the index is a constant. I don't distinguish whether the separator was found for Alternatively, we can also mirror parsing: throwing What matters most is getting anything that let's us finally do away with every new codebase having tons of |
Background and motivation
While working with various solutions both internal and external, it appears there is a commonly used pattern to split strings just once which can be improved with an allocation-free helper.
Specifically, such patterns are especially common when processing headers, fields in JSON payloads, delimited pairs of values, etc.
For example, when handling OAuth2.0 headers, the often found pattern is the following:
While it has acceptable performance, we may be able to do better by providing convenience methods which would allow to handle "split once" cases idiomatically while simultaneously nudging the developers towards performance-oriented APIs.
In fact, when dealing with similar scenarios, Rust decided to explicitly have a dedicated function over consuming a split iterator: https://doc.rust-lang.org/std/primitive.str.html#method.split_once.
UPD: Proposal amended with
SplitFirst/SplitLast
suggestion by @MichalPetrykaAPI Proposal
Concerns and open questions
Q: Too many overloads / the API is unnecessarily complex
A: It's a suggested tradeoff in order to support
string
,{ReadOnly}Memory<T>
and{ReadOnly}Span<T>
. The rationale behind it is to reduce boilerplate and have good discoverability i.e. when writingstring.Split
it becomes very easy to discover and use.SplitFirst
without having to "know" that such API exists beforehand.Q:
StringComparison
overloadsA: They have been omitted for brevity. It may be profitable to either add them or delegate them to a more complex APIs suggested in #93 or #76186. If you have any concerns, please let me know
Q: Exception handling,
.Empty
span/memory vsException
A: Since the API is intended to be used as a short-form, the suggested behavior is to return matched slices in
Segment
andRemainder
, withRemainder
beingIsEmpty
when the sliced source has no separatorReference impl.: https://gist.github.com/neon-sunset/df6fb9fe6bb71f11c2b47fbeae55e6da
API Usage
Simple use
Method chaining
Slicing bearer token out of full header string
Retrieving arbitrary data from a text field
Alternative Designs
Existing
string.Split(...)
overloads, #93 and #76186Risks
Minimal, the API is intended to co-exist with more complex versions like split iterator suggested in an alternate design.
If there will be demand, it allows for further extension with
.Split{First/Last}(this Span<T> source...)
overloads.The text was updated successfully, but these errors were encountered: