-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding GetChunks which allow efficient scanning of a StringBuilder #26207
Comments
@vancem can you please post here the final API proposal? |
We need a way of efficiently enumerating the chunks of a StringBuilder. The suggestion here is to provide the new method StringBuilder.EnumerateChunks which returns a (nested but public) type ChunkEnumerable that implements the IEnumerable<ReadOnlyMemory>. Here is the new surface area.
From a user's point of view the only new API is the StringBuilder.EnumerateChunks() method. The rest is just the pattern to make the C# foreach statement just work on the result.. Here is an example for efficiently implementing 'IndexOf' using EnumerateChunks() and foreach
The PR contains the actual implementation. Note that ChunkEnumerable does not actually declare that it implements IEnumerable<ReadOnlyMemory>, it just implemented all the right methods (which make foreach work). We considered having it return ReadOnlySpan, but decided that it was too fragile in that Span can't be used across 'await' statements (easily) and can't be passed to methods that MAY be async (these methods always take ReadOnlyMemory). ReadOnlyMemory does not have this problem (but is a bit less efficient since it does require you to convert it to a Span before accessing the characters (and also notice you need to cache the span in a local variable to keep span creation out of the inner loop). We considered also providing both an API to produce Spans and one that makes Memorys. We could do this, but it is a bunch of cloned code, and it SHOULD be the case that the JIT could eliminate the inefficiency (although it will probably require targeted work). API issues to consider include
See dotnet/coreclr#17530 for the actual implementation. |
Looks good, only change is that we think we should collapse the two types: class StringBuilder {
public ChunkEnumerator GetChunks();
public struct ChunkEnumerator
{
[EditorBrowsable(Never)] // Only here to make foreach work
public ChunkEnumerator GetEnumerator();
public bool MoveNext();
public ReadOnlyMemory<char> Current { get; }
}
}
|
As an aside, there was observation in the room that it might be time to make StringBuilder doubly linked. This would make the implementation of ChunkEnumerator simpler and make it possible to offer IEnumerable. (Indexer would not be any faster though) |
I have updated this issue to the API suggested by @terrajobst as well as the implementation. |
I have tests for this code (and they pass for me locally) but they go in the CoreFX repo and I need to check in this implementation and wait for a new build of CoreCLR before I can complete that. So I would like to check this in. If anyone has concerns please raise them i the next 1/2 hour or so, as I would like to check in before the nightly build kicks off. |
If the PR is green/signed off and CoreFX tests pass locally then you can certainly merge. |
Closing the issue as the StringBuilder.GetChunks was exposed in CoreFX with dotnet/corefx#29877. |
This issue started as #22371 but has morphed so much that it makes sense to open a new issue.
See coreclr pr dotnet/coreclr#17530 for more. This is just the API review.
EDIT @stephentoub 5/21. Copied proposal from below to here:
EDIT: Changed to final result (which has just the ChunkEnumerator type).
The text was updated successfully, but these errors were encountered: