-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Adding GetChunks which allow efficient scanning of a StringBuilder #17530
Conversation
@jaredpar @VSadov - above is a proposal for a way of getting chunks out of an StringBuilder. It is very simple, it is an Enumerator that returns ReadOnlySpan. The stack-only ness of Span DOES NOT PREVENT building an Enumerator that returns it, but at least currently it would cause some grief if you try to use it in an async method. Consider this example where i put some awaits in various points.
Today only Point 3 would be allowed as the span is 'dead' at that point and thus it would not need to be captured. However the compiler could be smart, and frankly after every 'await' it could insert a 'span=enumerator.Current' right after every 'await' and NOT have to capture the span into the closure. The question to you is : What do you think about this as a feature? If you like it, how hard is it? While this feature would be handy for this particular API, it really is a general problem of enumerating a collection of spans (in an async environment). We can fix it just by avoiding spans (using memory instead), but that is both less efficient, and more semantically complex. What do you think? (Today people can work-around this issue by expanding the 'foreach' by hand, but that is reasonably ugly....) cc: @davidfowl, @stephentoub, @jkotas |
This is introducing hidden side effects into developer code. How would you explain such a feature to users? Consider cases where Also consider the times when |
I think part of the makes this idea tractable is that there is a strong expectation that Enumerator.Current is side-effect free. Thus you could simply state that it is undefined what happens when this is not true (that is the compiler is allowed to insert as many calls to Enumerator.Current as it wants). You could narrow this if you like to say that this freedom only applies to enumerators that return ref-like structs and use await. Alternatively, you could say that semantics is that yes, logically there is EXACTLY a call to Enumerator.Current at variable initialization, and then after every await statement, but the compiler is free to optimize these away if it can. To first order, it simply does not (the JIT may or may not do it). This is a bit sub-optimal, but I could easily live with it. I want to note, however that the CURRENT behavior is pretty unhelpful (if you try to use enumerations of Span in an async method, you will get errors that you can't do that (even though there is an obvious way to make it work). This is rather unfortunate. If it were me I would define the semantics such that compiler is free to call Enumerator.Current as many times as it wants (that is it is the behavior can vary if Enumerator.Current has side effects). |
I do not share that expectation. Customers do bad things with their code all the time. This wouldn't be the first, second, third, etc ... time that a method we strongly expected to be side effect free was indeed not side effect free.
Undefined behavior is for C++. 😄 |
If you really want to have well defined semantics in the face of Enumerator.Current having side effects, you can do that. Simply state it happens when the variable is created, and after every await. You are done. Perfectly well defined semantics, AND semantics that people WANT (because it turns error behavior into very reasonable non-error behavior). The only reason the compile does this (and not the user), is because the user CAN'T because the COMPILER hid the variable needed to do it. |
Even if we could express purity of methods in CLR somehow, the I understand the overall idea and there could be some way to do what is desired here, but initial proposal for how to do it feels too fragile for something that language will have to live with, perhaps for many years. I like the part that iterator produces spans. It seems very clever. |
Here is what the code looks like if you tell people not to use foreach. I suppose it is not that bad (and only is needed if you need awaits in point 1 or point 2).
Is the current compiler is smart enough that it knows that 'span' is dead at all the awaits and thus does not try to capture them (which presumably would cause an error)? |
The code does not look too bad, but it would not currently compile. There is a bigger problem. Any kind of Span locals are not allowed in async methods right now. There is a proposal to relax this for The reason why we do not want to key of the capturing is because capturing is an implementation detail. We keep improving/changing the capturing story with every version. As the strategy changes we may capture less (or sometimes more, but hopefully less). Also the strategy is different between Release and Debug. In Debug we capture more aggressively, otherwise locals may vanish after For once I cannot guarantee that Mono C# compiler captures exactly the same set of locals. It most certainly does not. Basically we do not want to rely on a fluid thing full of optional implementation details and optimizations as a reason to accept or reject programs. |
One idea could be that the In the context of async method a delegate invoke should be cheap (evidently you are suggesting calling |
That would still run into the issue with Span locals not being allowed in async methods, but assuming dotnet/csharplang#1331 , it could work. |
The following code should work today but it needs a helper method.
So I think the upshot is that you CAN make things work with an Enumerable of a ReadOnlySpan in an async method, but dotnet/csharplang#1331 makes it better in most cases. It does mean that you can't use foreach over it in any async method. This may be acceptable. |
BTW: You can make |
Something like: static async Task<int> IndexOf1(StringBuilder sb, char c)
{
var spanEnum = sb.EnumerateChunks().GetEnumerator();
int i = 0;
while (spanEnum.MoveNext())
{
await Task.Delay(10); // Point 1
bool InnerLoop()
{
foreach (char ch in spanEnum.Current)
{
if (ch == c)
return true;
i++;
// Can't have await in this block since InnerLoop is an ordinary method.
// On the other hand it can use span locals.
// await Task.Delay(10); // Point 2
}
return false;
}
if (InnerLoop())
return i;
await Task.Delay(10); //Point 3
}
return -1;
} Not the prettiest piece, but I agree with the main point - you CAN make this work even in async, if ok with jumping through some hoops. |
Ah, I do see that variable capture inside the nested function is pretty nice here. The main pain is that you have to return to do any await, which is a pain, I am not at all convinced that it will happen enough to matter at all. At this point I think I am becoming comfortable with the API returning an Enumeratable over ReadOnlySpan (we don't need ReadOnlyMemory). It works great for synchronous code, and it is not that bad for async code. |
@vancem what if I want to call an asynchronous TextWriter method? |
@vancem can you clarify what the "chunks" are in this context? Do they map to the chunks in the implementation? That would couple API to implementation (which could change again, indeed StringBuilder was not originally chunk based) and also produce Spans of possibly wildly (and unexpectedly) varying sizes, depending on the history of the StringBuilder. Clearly the span returned cannot straddle a chunk boundary without an allocation somewhere. But would it make sense to cap the size of the span, if only to logically separate the implementation detail? |
I think the pattern is more general - a similar approach would work for anything backed by segmented buffers. Of which a "single buffer" is just a subset. I think varying chunk sizes is ok and expected. |
My response is basically the same as @VSadov. All we want is an API that allows things to give back the characters efficiently in a way that is mostly independent of the implementation. This proposal covers a broad swath (as long as the characters are stored as bobs of characters you can implement this interface efficiently). Sure if you encode the stringbuilder as UTF8 chars you have a problem, but it is only a perf problem and it is no worse then the perf problem they have today (since today we force them to form a string. |
@vancem, I think this should yield foreach (ReadOnlyMemory<char> buffer in sb.EnumerateChunks())
{
await textWriter.WriteAsync(buffer);
} I can't do that if |
338c1a6
to
9b1f2b2
Compare
I have changed it from ReadOnlySpan back to ReadOnlyMemory. It is fragile being a Span, and the cost of going to Memory is not high. |
After a bunch of discussion above, I have decided to use ReadOnlyMemory rather than ReadOnlySpan because it allows the chunks to be passed around to APIs that are ASYNC (which can't use Span) (this is Stephen's point above). The disadvantage of this choice is that
|
_currentChunk = next; | ||
return true; | ||
} | ||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit newline above
|
||
// There is a performance-vs-allocation tradeoff. Because the chunks | ||
// are a linked list with each chunk pointing to its PREDECESSOR, walking | ||
// the list FORWARD is not efficient. If there are few chunks (< 8) we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the number 8 is just a judgement call here? I don't have good intuition for the number of chunks in the real world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 8 is a judgement call. Bascially you want a number before N*N gets too big. Generally if you do 'nothing' StringBuilders start with 16 bytes and double each time, so this handles strings up to about 2K-4K if you don't ever specify a starting length. Thus 8 keeps things non-allocating for the common case of strings < 256 bytes (e.g. paths). On the other hand, at 2-4K of data, the cost of copying the data is probably now large with respect to the cost of allocating the array, so the value of making the nubmer bigger is probably not a big deal (especially because people using StringBuilder on large strings should be setting a good starting length).
We can tune it as necessary.
while (0 <= --chunkCount) | ||
{ | ||
_chunks[chunkCount] = stringBuilder; | ||
stringBuilder = stringBuilder.m_ChunkPrevious; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert stringBuilder.m_ChunkPrevious?
// However above this size, we allocate an array to hold pointers to all | ||
// the chunks and we can be efficient for large N. | ||
int chunkCount = ChunkCount(stringBuilder); | ||
if (8 < chunkCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, this is sort of a Yoda conditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the change you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (chunkCount > 8)
https://en.wikipedia.org/wiki/Yoda_conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Quite a while back, I adopted a convention of always making relational operators go from smallest to largest. (basically I never use > or >=). I find it slightly clearer and less error prone, as it eliminates one source of arbitrary variation, and allows visual position to help in reading the code (this is especially nice if you are testing for a range).
For what it is worth...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I may do it.
Debug.Assert(stringBuilder != null); // Because it is only called with a 'this' pointer. | ||
_stringBuilder = stringBuilder; | ||
} | ||
private StringBuilder _stringBuilder; // We insure this is never null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: insure => ensure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: readonly
/// It needs to be public (so the compiler can use it when building a foreach statement) | ||
/// but users typically don't use it explicitly (which is why it is a nested type). | ||
/// </summary> | ||
public struct ChunkEnumerable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: readonly?
} | ||
_chunkPos = -1; | ||
} | ||
StringBuilder[] _chunks; // These are in normal order (first chunk first) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: private readonly (I'll stop commenting on these things)
@vancem so this just needs to go through an API review and it'll be into 2.2? |
OK, I have made the updates based on the API review https://github.com/dotnet/corefx/issues/29770, so now we just have ChunkEnumerator, but from a users point of view (that he can use foreach) nothing has changed. I have tests to add, but they go in the CoreFX repo along with changes to System.Runtime, and have to be done after this merge is complete and we have a good build that CoreFX can refer to. So fair warning: I am going to merge this later today (so we tonight's build has it), so if you have any reservations, speak now (in the next hour or two). |
…erator Adding EnumerateChunks which allow efficient scanning of a StringBuilder Signed-off-by: dotnet-bot <[email protected]>
…erator Adding EnumerateChunks which allow efficient scanning of a StringBuilder Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
_chunkPos = -1; | ||
} | ||
|
||
readonly StringBuilder[] _chunks; // These are in normal order (first chunk first) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please put fields first, per coding guidelines, and add missing visibility specifiers (private)? in ManyChunkInfo and ChunkEnumerator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…erator Adding EnumerateChunks which allow efficient scanning of a StringBuilder Signed-off-by: dotnet-bot <[email protected]>
…erator Adding EnumerateChunks which allow efficient scanning of a StringBuilder Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
This change is the start of addressing the issue CoreFX issue
https://github.com/dotnet/corefx/issues/21248
which wants to have an efficient way of reading the characters of a StringBuilder. However it has morphed alot since then (from a callback to an iterator model). The API review part of this is the following issue.
https://github.com/dotnet/corefx/issues/29770
This PR is an implementation of that spec. In particular we implement the following new method on StringBuilder.
ChunkEnumerator is a struct that follows the IEnumerable<ReadOnlyMemory> pattern and thus works with the C# foreach statement. Here is an example for efficiently implementing 'IndexOf' using GetChunks
The PR contains the actual implementation. Note that ChunkEnumerable does not actually implement IEnumerable<ReadOnlyMemory>.
We considered having it return ReadOnlySpan, but decided that it was too fragile in that Span can't be used across 'await' calls and can't be passed to methods that MAY be async. ReadOnlyMemory does not have this problem (but is a bit less efficient), but does require you to convert it to a Span before accessing the characters (and notice you need to cache the span in a local variable to keep span creation out of the inner loop).
This is not ideal, but is this issue will resurface other places (whenever we want to return chunks), and so it is likely we will invest in JIT optimization to optimize away any inefficiency of using ReadOnlyMemory and converting immediately to Span (which is what would happen in the synchronous case).
At this point it is ready for API review. This amounts to signing off on using ReadOnlyMemory instead of ReadOnlySpan, and the names (is 'Chunks' the right suffix?) and the fact that ChunkEnumerable does not actually implement IEnumerable<ReadOnlyMemory> (It could, it is just more code (which we can add later)).
I have tests, however there seems to be a chicken and egg problem in that you need to have an implementation, before CoreFX can expose it (and have tests for it). I have tested the code (all code paths are covered), but I will need to check in tests as a separate PR.