Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Adding GetChunks which allow efficient scanning of a StringBuilder #17530

Merged
merged 12 commits into from
May 22, 2018

Conversation

vancem
Copy link

@vancem vancem commented Apr 12, 2018

This change is the start of addressing the issue CoreFX issue

https://github.com/dotnet/corefx/issues/21248

which wants to have an efficient way of reading the characters of a StringBuilder. However it has morphed alot since then (from a callback to an iterator model). The API review part of this is the following issue.

https://github.com/dotnet/corefx/issues/29770

This PR is an implementation of that spec. In particular we implement the following new method on StringBuilder.

class StringBuilder { 
      ChunkEnumerator GetChunks();
}

ChunkEnumerator is a struct that follows the IEnumerable<ReadOnlyMemory> pattern and thus works with the C# foreach statement. Here is an example for efficiently implementing 'IndexOf' using GetChunks

        static int IndexOf(StringBuilder sb, char c)
        {
            int pos = 0;
            foreach (ReadOnlyMemory<char> chunk in sb.GetChunks())
            {
                var span = chunk.Span;  // We create a local variable because it is more efficient
                for (int i = 0; i < span .Length; i++)
                    if (span [i] == c)
                        return pos + i;
                pos += span.Length;
            }
            return -1;
        }

The PR contains the actual implementation. Note that ChunkEnumerable does not actually implement IEnumerable<ReadOnlyMemory>.

We considered having it return ReadOnlySpan, but decided that it was too fragile in that Span can't be used across 'await' calls and can't be passed to methods that MAY be async. ReadOnlyMemory does not have this problem (but is a bit less efficient), but does require you to convert it to a Span before accessing the characters (and notice you need to cache the span in a local variable to keep span creation out of the inner loop).

This is not ideal, but is this issue will resurface other places (whenever we want to return chunks), and so it is likely we will invest in JIT optimization to optimize away any inefficiency of using ReadOnlyMemory and converting immediately to Span (which is what would happen in the synchronous case).

At this point it is ready for API review. This amounts to signing off on using ReadOnlyMemory instead of ReadOnlySpan, and the names (is 'Chunks' the right suffix?) and the fact that ChunkEnumerable does not actually implement IEnumerable<ReadOnlyMemory> (It could, it is just more code (which we can add later)).

I have tests, however there seems to be a chicken and egg problem in that you need to have an implementation, before CoreFX can expose it (and have tests for it). I have tested the code (all code paths are covered), but I will need to check in tests as a separate PR.

@vancem vancem added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Apr 12, 2018
@vancem
Copy link
Author

vancem commented Apr 12, 2018

@jaredpar @VSadov - above is a proposal for a way of getting chunks out of an StringBuilder. It is very simple, it is an Enumerator that returns ReadOnlySpan.

The stack-only ness of Span DOES NOT PREVENT building an Enumerator that returns it, but at least currently it would cause some grief if you try to use it in an async method. Consider this example where i put some awaits in various points.

        static int IndexOf(StringBuilder sb, char c)
        {
            int pos = 0;
            foreach (ReadOnlySpan<char> span in sb.EnumerateChunks())
            {
                await Task.Delay(10);  // Point 1
                for (int i = 0; i < span.Length; i++)
                    if (span[i] == c)
                        await Task.Delay(10);  // Point 2
                pos += span.Length;
                await Task.Delay(10);  // Point 3
            }
            return -1;
        }

Today only Point 3 would be allowed as the span is 'dead' at that point and thus it would not need to be captured.

However the compiler could be smart, and frankly after every 'await' it could insert a 'span=enumerator.Current' right after every 'await' and NOT have to capture the span into the closure.

The question to you is : What do you think about this as a feature? If you like it, how hard is it?

While this feature would be handy for this particular API, it really is a general problem of enumerating a collection of spans (in an async environment). We can fix it just by avoiding spans (using memory instead), but that is both less efficient, and more semantically complex.

What do you think? (Today people can work-around this issue by expanding the 'foreach' by hand, but that is reasonably ugly....)

cc: @davidfowl, @stephentoub, @jkotas

@jaredpar
Copy link
Member

The question to you is : What do you think about this as a feature? If you like it, how hard is it?

This is introducing hidden side effects into developer code. How would you explain such a feature to users? Consider cases where var is used. How do you explain the set of variables and functions that execute at the conclusion of an await block.

Also consider the times when await executes promptly. Do we call sb.Current anyways? If not then that means the times, and potential side effects, of executing this hidden code is random-ish.

@vancem
Copy link
Author

vancem commented Apr 12, 2018

@jaredpar

I think part of the makes this idea tractable is that there is a strong expectation that Enumerator.Current is side-effect free. Thus you could simply state that it is undefined what happens when this is not true (that is the compiler is allowed to insert as many calls to Enumerator.Current as it wants). You could narrow this if you like to say that this freedom only applies to enumerators that return ref-like structs and use await.

Alternatively, you could say that semantics is that yes, logically there is EXACTLY a call to Enumerator.Current at variable initialization, and then after every await statement, but the compiler is free to optimize these away if it can. To first order, it simply does not (the JIT may or may not do it). This is a bit sub-optimal, but I could easily live with it.

I want to note, however that the CURRENT behavior is pretty unhelpful (if you try to use enumerations of Span in an async method, you will get errors that you can't do that (even though there is an obvious way to make it work). This is rather unfortunate.

If it were me I would define the semantics such that compiler is free to call Enumerator.Current as many times as it wants (that is it is the behavior can vary if Enumerator.Current has side effects).

@jaredpar
Copy link
Member

I think part of the makes this idea tractable is that there is a strong expectation that Enumerator.Current is side-effect free.

I do not share that expectation. Customers do bad things with their code all the time. This wouldn't be the first, second, third, etc ... time that a method we strongly expected to be side effect free was indeed not side effect free.

Thus you could simply state that it is undefined what happens when this is not true (that is the compiler is allowed to insert as many calls to Enumerator.Current as it wants).

Undefined behavior is for C++. 😄

@vancem
Copy link
Author

vancem commented Apr 12, 2018

If you really want to have well defined semantics in the face of Enumerator.Current having side effects, you can do that. Simply state it happens when the variable is created, and after every await. You are done. Perfectly well defined semantics, AND semantics that people WANT (because it turns error behavior into very reasonable non-error behavior).

The only reason the compile does this (and not the user), is because the user CAN'T because the COMPILER hid the variable needed to do it.

@VSadov
Copy link
Member

VSadov commented Apr 12, 2018

Even if we could express purity of methods in CLR somehow, the Current here is clearly not idempotent since it can observe sideeffects. Calling Current twice can return different results.
The overall language design (and common sense) forbids duplicating operations like this.

I understand the overall idea and there could be some way to do what is desired here, but initial proposal for how to do it feels too fragile for something that language will have to live with, perhaps for many years.

I like the part that iterator produces spans. It seems very clever.

@vancem
Copy link
Author

vancem commented Apr 12, 2018

Here is what the code looks like if you tell people not to use foreach.

I suppose it is not that bad (and only is needed if you need awaits in point 1 or point 2).

        static async Task<int> IndexOf(StringBuilder sb, char c)
        {
            int pos = 0;
            var spanEnum = sb.EnumerateChunks().GetEnumerator();
            while (spanEnum.MoveNext())
                await Task.Delay(10); // Point 1
                var span = spanEnum.Current();
                for (int i = 0; i < span.Length; i++)
                    if (span[i] == c)
                    {
                        await Task.Delay(10); // Point 2
                        span = spanEnum.Current();
                    }
                await Task.Delay(10);  // Point 3
            }
            return -1;
        }

Is the current compiler is smart enough that it knows that 'span' is dead at all the awaits and thus does not try to capture them (which presumably would cause an error)?

@VSadov
Copy link
Member

VSadov commented Apr 12, 2018

The code does not look too bad, but it would not currently compile.

There is a bigger problem. Any kind of Span locals are not allowed in async methods right now. There is a proposal to relax this for { } blocks that do not contain awaits.

The reason why we do not want to key of the capturing is because capturing is an implementation detail. We keep improving/changing the capturing story with every version. As the strategy changes we may capture less (or sometimes more, but hopefully less). Also the strategy is different between Release and Debug. In Debug we capture more aggressively, otherwise locals may vanish after await.

For once I cannot guarantee that Mono C# compiler captures exactly the same set of locals. It most certainly does not.

Basically we do not want to rely on a fluid thing full of optional implementation details and optimizations as a reason to accept or reject programs.

@VSadov
Copy link
Member

VSadov commented Apr 12, 2018

One idea could be that the Current returns a delegate that returns the span. In this way you could keep the delegate however you want and fetch the span between awaits with a simple ().

In the context of async method a delegate invoke should be cheap (evidently you are suggesting calling Current after every await as "ok") , but I would be a bit worried about penalizing the non-async case.

@VSadov
Copy link
Member

VSadov commented Apr 12, 2018

That would still run into the issue with Span locals not being allowed in async methods, but assuming dotnet/csharplang#1331 , it could work.

@vancem
Copy link
Author

vancem commented Apr 12, 2018

The following code should work today but it needs a helper method.
Basically it passes the span into the helper and the helper does all the work, and returns if it needs to await. This could get arbitrarily ugly to partition your code in this way, but I see that realistic cases are likely to be fine (it is not a hardship. Indeed with dotnet/csharplang#1331 you probably would not need the helper method most of the time and the code would still be clean.

        static async Task<int> IndexOf1(StringBuilder sb, char c)
        {
            var spanEnum = sb.EnumerateChunks().GetEnumerator();
            while (spanEnum.MoveNext())
            {
                await Task.Delay(10); // Point 1

                int i = 0;
                while (Helper(spanEnum.Current, char c))
                    await Task.Delay(10); // Point 2

                await Task.Delay(10); //Point 3
            }
            return -1;
        }

        static bool Helper(ReadOnlySpan<char> span, ref int i, char c)
        {
            while (i < span.Length)
            {
                if (span[i] == c)
                    return true;
                i++;
            }
            return false;
        }

So I think the upshot is that you CAN make things work with an Enumerable of a ReadOnlySpan in an async method, but dotnet/csharplang#1331 makes it better in most cases. It does mean that you can't use foreach over it in any async method. This may be acceptable.

@VSadov
Copy link
Member

VSadov commented Apr 12, 2018

BTW: You can make Helper a local nested method and get very close (syntactically, and operationally, if JIT inlines it) to what dotnet/csharplang#1331 promises.

@VSadov
Copy link
Member

VSadov commented Apr 12, 2018

Something like:

        static async Task<int> IndexOf1(StringBuilder sb, char c)
        {
            var spanEnum = sb.EnumerateChunks().GetEnumerator();
            int i = 0;
            
            while (spanEnum.MoveNext())
            {
                await Task.Delay(10); // Point 1

                bool InnerLoop()
                {
                    foreach (char ch in spanEnum.Current)
                    {
                        if (ch == c)
                            return true;

                        i++;

                        // Can't have await in this block since InnerLoop is an ordinary method.
                        // On the other hand it can use span locals.
                        // await Task.Delay(10); // Point 2
                    }

                    return false;
                }

                if (InnerLoop())
                    return i;                                        
                                   
                await Task.Delay(10); //Point 3
            }
            return -1;
        }

Not the prettiest piece, but I agree with the main point - you CAN make this work even in async, if ok with jumping through some hoops.

@vancem
Copy link
Author

vancem commented Apr 13, 2018

Ah, I do see that variable capture inside the nested function is pretty nice here. The main pain is that you have to return to do any await, which is a pain, I am not at all convinced that it will happen enough to matter at all.

At this point I think I am becoming comfortable with the API returning an Enumeratable over ReadOnlySpan (we don't need ReadOnlyMemory). It works great for synchronous code, and it is not that bad for async code.

@davidfowl
Copy link
Member

@vancem what if I want to call an asynchronous TextWriter method?

@danmoseley
Copy link
Member

@vancem can you clarify what the "chunks" are in this context? Do they map to the chunks in the implementation? That would couple API to implementation (which could change again, indeed StringBuilder was not originally chunk based) and also produce Spans of possibly wildly (and unexpectedly) varying sizes, depending on the history of the StringBuilder. Clearly the span returned cannot straddle a chunk boundary without an allocation somewhere. But would it make sense to cap the size of the span, if only to logically separate the implementation detail?

@VSadov
Copy link
Member

VSadov commented Apr 17, 2018

I think the pattern is more general - a similar approach would work for anything backed by segmented buffers. Of which a "single buffer" is just a subset.

I think varying chunk sizes is ok and expected.
It should probably never return an empty chunk (just because that is silly, but not dangerous really).

@vancem
Copy link
Author

vancem commented Apr 17, 2018

My response is basically the same as @VSadov. All we want is an API that allows things to give back the characters efficiently in a way that is mostly independent of the implementation. This proposal covers a broad swath (as long as the characters are stored as bobs of characters you can implement this interface efficiently). Sure if you encode the stringbuilder as UTF8 chars you have a problem, but it is only a perf problem and it is no worse then the perf problem they have today (since today we force them to form a string.

@stephentoub
Copy link
Member

@vancem, I think this should yield ReadOnlyMemory<char> rather than ReadOnlySpan<char>. It's not just await'ing between iterations that's an issue, but rather being able to hand off the value to an asynchronous method as well, e.g.

foreach (ReadOnlyMemory<char> buffer in sb.EnumerateChunks())
{
    await textWriter.WriteAsync(buffer);
}

I can't do that if buffer is a span.

@vancem vancem force-pushed the StringBuilderEnumerator branch from 338c1a6 to 9b1f2b2 Compare May 15, 2018 21:29
@vancem
Copy link
Author

vancem commented May 15, 2018

I have changed it from ReadOnlySpan back to ReadOnlyMemory. It is fragile being a Span, and the cost of going to Memory is not high.

@vancem
Copy link
Author

vancem commented May 17, 2018

After a bunch of discussion above, I have decided to use ReadOnlyMemory rather than ReadOnlySpan because it allows the chunks to be passed around to APIs that are ASYNC (which can't use Span) (this is Stephen's point above).

The disadvantage of this choice is that

  1. Generally you want to access the characters, which means calling chunk.Span. Thus we made things slightly harder in a common case (but we allow more schenarios.
  2. This is less efficient since you have to make a Memory object just to fetch a span from it and throw it away. JIT Optimization could fix this however.

_currentChunk = next;
return true;
}
/// <summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit newline above


// There is a performance-vs-allocation tradeoff. Because the chunks
// are a linked list with each chunk pointing to its PREDECESSOR, walking
// the list FORWARD is not efficient. If there are few chunks (< 8) we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the number 8 is just a judgement call here? I don't have good intuition for the number of chunks in the real world.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 8 is a judgement call. Bascially you want a number before N*N gets too big. Generally if you do 'nothing' StringBuilders start with 16 bytes and double each time, so this handles strings up to about 2K-4K if you don't ever specify a starting length. Thus 8 keeps things non-allocating for the common case of strings < 256 bytes (e.g. paths). On the other hand, at 2-4K of data, the cost of copying the data is probably now large with respect to the cost of allocating the array, so the value of making the nubmer bigger is probably not a big deal (especially because people using StringBuilder on large strings should be setting a good starting length).

We can tune it as necessary.

while (0 <= --chunkCount)
{
_chunks[chunkCount] = stringBuilder;
stringBuilder = stringBuilder.m_ChunkPrevious;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert stringBuilder.m_ChunkPrevious?

// However above this size, we allocate an array to hold pointers to all
// the chunks and we can be efficient for large N.
int chunkCount = ChunkCount(stringBuilder);
if (8 < chunkCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, this is sort of a Yoda conditional

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the change you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Quite a while back, I adopted a convention of always making relational operators go from smallest to largest. (basically I never use > or >=). I find it slightly clearer and less error prone, as it eliminates one source of arbitrary variation, and allows visual position to help in reading the code (this is especially nice if you are testing for a range).

For what it is worth...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I may do it.

Debug.Assert(stringBuilder != null); // Because it is only called with a 'this' pointer.
_stringBuilder = stringBuilder;
}
private StringBuilder _stringBuilder; // We insure this is never null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: insure => ensure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: readonly

/// It needs to be public (so the compiler can use it when building a foreach statement)
/// but users typically don't use it explicitly (which is why it is a nested type).
/// </summary>
public struct ChunkEnumerable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: readonly?

}
_chunkPos = -1;
}
StringBuilder[] _chunks; // These are in normal order (first chunk first)
Copy link
Member

@stephentoub stephentoub May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: private readonly (I'll stop commenting on these things)

@davidfowl
Copy link
Member

@vancem so this just needs to go through an API review and it'll be into 2.2?

@vancem vancem removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label May 22, 2018
@vancem
Copy link
Author

vancem commented May 22, 2018

OK, I have made the updates based on the API review https://github.com/dotnet/corefx/issues/29770, so now we just have ChunkEnumerator, but from a users point of view (that he can use foreach) nothing has changed.

I have tests to add, but they go in the CoreFX repo along with changes to System.Runtime, and have to be done after this merge is complete and we have a good build that CoreFX can refer to.

So fair warning: I am going to merge this later today (so we tonight's build has it), so if you have any reservations, speak now (in the next hour or two).

@vancem vancem merged commit f9bb9e1 into dotnet:master May 22, 2018
dotnet-bot pushed a commit to dotnet/corert that referenced this pull request May 22, 2018
…erator

Adding EnumerateChunks which allow efficient scanning of a StringBuilder

Signed-off-by: dotnet-bot <[email protected]>
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request May 22, 2018
…erator

Adding EnumerateChunks which allow efficient scanning of a StringBuilder

Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
_chunkPos = -1;
}

readonly StringBuilder[] _chunks; // These are in normal order (first chunk first)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please put fields first, per coding guidelines, and add missing visibility specifiers (private)? in ManyChunkInfo and ChunkEnumerator

Copy link
Member

@danmoseley danmoseley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

jkotas pushed a commit to dotnet/corert that referenced this pull request May 23, 2018
…erator

Adding EnumerateChunks which allow efficient scanning of a StringBuilder

Signed-off-by: dotnet-bot <[email protected]>
stephentoub pushed a commit to dotnet/corefx that referenced this pull request May 23, 2018
…erator

Adding EnumerateChunks which allow efficient scanning of a StringBuilder

Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
@vancem vancem changed the title Adding EnumerateChunks which allow efficient scanning of a StringBuilder Adding GetChunks which allow efficient scanning of a StringBuilder May 24, 2018
@vancem vancem deleted the StringBuilderEnumerator branch May 31, 2018 00:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants