-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal: Add a ValueStringBuilder #25587
Comments
@jkotas, @stephentoub, @KrzysztofCwalina, @danmosemsft, @ahsonkhan |
I didn't see that we'd added these arguments. I don't think we should have them. If you want to null terminate, you can call |
Discussed this one at length with @jkotas in dotnet/coreclr#16921. The problem with appending a null is that it changes the
|
The |
No, it leaves Length as is. It might increase Capacity. |
Hmm, I see, ok. |
Are they needed if there is an overload taking ROSpan? public void Append(string s); |
No, they aren't. I'll add a proposed section. |
There is a proposal dotnet/csharplang#93 |
There are several related efforts in corfxlab. We should try to unify (or make them consistent). See: This ValueStringBuilder is a specialized BufferWriter. And so if we add it, we need to rationalize it with BufferWriter, i.e. we should not have both BufferWriter and ValueStringBuilder in .NET. BufferWriter (the non-generic one) is a byref struct that allows writing (and resizing) to a span buffer. Today, it writes Utf8, but you could imagine writing either Utf8 or utf16 based on some setting. After the buffer is written, tere could be ToString or something like that to produce string result. |
@KrzysztofCwalina thanks for the links! I've updated the text with design goals and will review the other efforts. |
In case of growable could be useful to have a boolean properties to know if VSB has grown? In a hot path could be useful avoid to add pressure to underlying ArrayPool.Shared(today) and improve performance by choosing the right size of initial buffer. |
Some questions, I guess...
|
Speaking of related efforts... There's also the |
Perhaps we should add an EventSource log for this? |
These were the most critical/obvious (based on current usage). I didn't want to start the discussion with too much complexity. Attacking this in priority order seemed like it would go a little quicker.
Hadn't considered it is all. I'll take a look. |
Fluent pattern adds overhead, and it does not work for structs that needs to be disposed. |
Combining two of @khellang's comment https://github.com/dotnet/corefx/issues/20378#issuecomment-304400894 & https://github.com/dotnet/corefx/issues/28379#issuecomment-375975366, it would be super helpful for almost all the .NET developers if we get a comprehensive blog article or Channel 9 video on all these newly optimized string related types in .NET Core and ASP.NET Core detailing when to use which and why. During that exercise if some overlapping types are found, please consider deprecating one. With every upcoming string-related type, it is getting scarier. 😱 |
I think the general guidance (once
|
|
No, but with the other new APIs (on |
😕 The API proposal itself clearly states:
|
I'm assuming @terrajobst was referring to the stack-only-ness of There might be valid reasons for yet another slice-of-string type, but as I mentioned in https://github.com/dotnet/corefx/issues/20378#issuecomment-304400894, the number of these types are getting a bit out of hand and it's hard to know which to use and which to support in libraries. |
The key thing we're missing that this addresses is a low allocation expandable |
I have a question regarding the Dispose() of this structure it should be always called? |
It returns any potentially rented buffers to the array pool. |
Note that I'm exploring alternative options utilizing some of the concepts @KrzysztofCwalina brought up (with @KrzysztofCwalina). We'll circle back around with more info. |
@JeremyKuhne have you made progress on exploring new options? I think we should definitely get this into 3.0. |
dotnet/corefxlab#2177 is part of what might negate the need for this. I currently own productizing BufferReader/Writer (dotnet/corefxlab#2166) and have just started in earnest. I'll be experimenting with how I can provide the same functionality and taking a look at any tradeoffs. ValueStringBuilder is the backup plan if I can't provide a reasonable solution with the other new types.
I intend to get this or an equivalent in. I really want it as well as not having this sort of type is blocking adding other APIs that I'm very interested in seeing. :) |
A few months back I implemented basically this, thinking it was a good idea as well. Exact implementation will cause performance to vary, so take it with a grain of salt, but I was found was roughly: For allocations of 5 chars long, less than 12 allocations was faster to use the traditional Hopefully you're getting the picture without me needing to clone the old commit and recreate the 3D graphs. The amount of situations where the vSB was actually higher performing were extremely limited, and not clearly or obviously defined. Furthmore, in every single code base I have where this might have been useful, I worked out ways to just use an array/buffer, where even additional computations or value passing still outperformed the vSB in every single instance. So, it might be useful, but as a heads up to whoever implements this, you're gonna need to benchmark the ever living **** out of it, and document the small hole where it's actually useful. |
@Entomy by faster, presumably you mean allocation. There is also the garbage to consider. |
@danmosemsft Right, absolutely. That's basically the point I'm trying to make, is that this is complicated and it seems like there's only a very small, and unusual, range in which this is viable. |
Reduction in allocation often has little-to-no measurable impact on serialized benchmark throughput. The benefit of allocation reduction when it comes to throughput is generally about the whole app running at scale. |
This is marked blocked because it potentially requires language features. We need a way to enforce that structs are passed only by reference so that we don't double return arrays to the pool. |
Anything slated for language changes to allow this? I think forcing ref structs makes sense as a new language construct. |
Will it come out in 6.0.0? |
For now I exposed my own by essentially shamelessly copying it: Also you can use https://github.com/Cysharp/ZString which has a great StringBuilder-like API. |
This is essentially blocked on resolving discussion in #50389. |
Since roslyn adds support for utf8 strings handled as |
Has been anything further done in this regard, i ask out of curiosity |
but not public for some reason. what is that reason? public |
|
Just saw another copy of this type pop up, and see that we currently have 20+ copies in the product. https://source.dot.net/#q=ValueStringBuilder Is there any progress on the language feature that would make us comfortable exposing this type? |
To clarify (I'm not sure if this is what you meant or not), we have that many in binaries, not that many source copies, with the same source built as internal into many places. |
We should consider making a value based StringBuilder to allow low allocation building of strings. While Span allows you to provide a writable buffer, in many scenarios we have a need to get or build strings and we don't know precisely how much space will be needed ahead of time. Having an abstraction that can grow beyond a given initial buffer is particularly useful as it doesn't require looping with
Try*
APIs- which can be both complicated and have negative performance implications.We currently use
ValueStringBuilder
for this purpose internally. It starts with an optional initial buffer (which we often stackalloc) and will grow using ArrayPool if needed.Design Goals
char* szValue
)API
Here is the proposed API:
This is the current shape of our internal
ValueStringBuilder
:Sample Code
Here is a common pattern on an API that could theoretically be made public if ValueStringBuilder was public:
(Although we would call this one GetFullUserName or something like that.)
https://github.com/dotnet/corefx/blob/050bc33738887d9d8fcc9bc5965b7d9ca65bc7f4/src/System.Runtime.Extensions/src/System/Environment.Win32.cs#L40-L56
The caller is above this method:
https://github.com/dotnet/corefx/blob/050bc33738887d9d8fcc9bc5965b7d9ca65bc7f4/src/System.Runtime.Extensions/src/System/Environment.Win32.cs#L13-L38
Usage of AppendSpan:
https://github.com/dotnet/corefx/blob/3538128fa1fb2b77a81026934d61cd370a0fd7f5/src/System.Runtime.Numerics/src/System/Numerics/BigNumber.cs#L550-L560
I'll add more usage details and possible API surface area.
Notes
bool TryGet*(Span)
overload? (see https://github.com/dotnet/coreclr/pull/17097/files#r176560435)Dispose()
in ausing
statement? (Proposal: Allow Dispose by Convention csharplang#93)AppendFormat
overloads?AppendSpan()
is a little tricky to grok- is there a better term/pattern?The text was updated successfully, but these errors were encountered: