-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add [ThreadStatic] StringBuilder in JsonTextTokenizer #15794
Add [ThreadStatic] StringBuilder in JsonTextTokenizer #15794
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
CLA signed |
Is it okay if I push another PR for changing JsonToken to readonly struct? |
Certainly happy to see a PR - the fact that it's internal is helpful here. (I'll try to review this PR some time next week. Looking very briefly though, I'd encourage you to consider ThreadLocal vs ThreadStatic; I was under the impression that these days ThreadLocal was generally preferred, but I could be wrong.) |
@jskeet I've mimic code from here StringBuilderCache and it is using [ThreadStatic]. If you prefer I can try with ThreadLocal, but this code is "battle tested" in runtime and also in our software :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative to consider would be to have a private nested StringBuilderCache class instead of adding these methods directly to the JsonTextTokenizer - that we wouldn't need to add "StringBuilder" into all the names.
@jskeet |
I think it's worth extracting as a general class, but I'd start with it in JsonTextTokenizer rather than applying it to all uses of StringBuilder in the project. We can easily expand its usage further later. |
Right, I'll fetch and try to do some benchmarking when I can. It may not be this week though. |
Benchmark results: Before optimizations:
After stringbuilder:
So it looks like this is probably worth doing (and the benefit will very much depend on the JSON being parsed). |
@jskeet Where do you see "new warnings"? Maybe there is somewhere a check that should be set in order to be executed....(Check for Safety) |
Where you're trying to use Nullable Reference Types when the project doesn't support it.
It's going to be very hard for you to contribute if you're unable to compile the code. Can you not temporarily change the configuration on your machine to allow the use of nuget.org? |
I've triggered the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like @JamesNK to have a look at this before merging, in case he has thoughts. But otherwise, this looks fine to me.
I'm not fimilar with the pros and cons of thread static. I guess overall memory usage will be higher. The code looks fine, although you should add a note that it is copied from https://github.com/dotnet/runtime/blob/72c488e37818e18557b47292f519489ea60af2f8/src/libraries/Common/src/System/Text/StringBuilderCache.cs#L7 |
@JamesNK I've added note from where I've copied code. Here it is a result with allocations once I've build this PS + JsonToken structure of the same benchmarked app. In general it is 120MB less |
I mean the steady state memory usage of the app will be higher. Cached StringBuilders stick around. |
@jskeet Do you need something more from my side? Or this is enough to be merged? |
Just time to get round to final checks and things. Sorry - I'm really busy and Protobuf is only one small part of my work, so it's hard to get round to things. |
{ | ||
if (sb.Capacity <= MaxCachedStringBuilderSize) | ||
{ | ||
cachedInstance = sb; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep the bigger of the two here? There's a chance the cached one has bigger capacity than the one being released and we might want to keep the bigger one to reduce the chances of line 758 evaluating to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to not keep too big StringBuilder instances as they are static per thread and can increase total memory consumption of the app. At least this are considerations that people in .net runtime had. As written above the class - this is a copy of internal class used there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm not saying we should keep anything that's above MaxCachedStringBuilderSize
. I'm saying we should do something like the following code, to make sure that we keep cache the biggest string builder possible of the ones we have already allocated, within the MaxCachedStringBuilderSize
boundaries.
cachedInstance = cachedInstance?.Capacity >= sb.Capacity ? cachedInstance : sb;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you check the code in Acquire - it clears cachedInstance and set it to null in happy path.
So when releasing cachedInstance will be null and there is no need to check it's capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm refering to this scenario:
- Acquiere sb of 50 capacity
- Release sb of 50 capacity => sb of 50 is now cached
- Acquire sb of 100 capacity => new sb of 100 is created, sb of 50 continues to be cached
- Release sb of 100 capacity => sb of 100 is discarded, even though it's better to discard the 50 capacity and keep the 100 capacity
- Acquire sb of 70 capacity => new sb of 70 capacity is created, sb of 50 continues to be cached, we could have reused the sb of 100 capacity
It's probably easy to write a test that demontrates this, unless I'm missunderstanding something. This seems like a simple change that would maximize the benefits of caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I understand your concerns and pushed suggested changes. Really nice catch. Maybe you should propose it also in runtime repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
Same benchmark as in here main: BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3235/23H2/2023Update/SunValley3) Job=DefaultJob Error=3.84 ns StdDev=3.21 ns
PR:
|
@mkruskal-google: This has been reviewed from a C# standpoint, and I've just marked it safe to run tests again. If you could review it to kick off the merge process, that'd be great. |
Implemented in fac929d |
COPYBARA_INTEGRATE_REVIEW=protocolbuffers#15794 from TrayanZapryanov:cache_stringbuilder 596147e PiperOrigin-RevId: 613251480
While profiling allocations in our code, I've found a lot of StringBuilder instances.
Changes add similar functionality to the dotnet StringBuilderCache class in JsonTextTokenizer.
They are not modifying existing logic and only will reduce relatively small text/number allocations by using cached instance.