Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add [ThreadStatic] StringBuilder in JsonTextTokenizer #15794

Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 63 additions & 7 deletions csharp/src/Google.Protobuf/JsonTokenizer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,8 @@ private void ValidateState(State validStates, string errorPrefix)
/// </summary>
private string ReadString()
{
var value = new StringBuilder();
//builder will not be released in case of an exception, but this is not a problem and we will create new on next Acquire
var builder = StringBuilderCache.Acquire();
bool haveHighSurrogate = false;
while (true)
{
Expand All @@ -316,7 +317,7 @@ private string ReadString()
{
throw reader.CreateException("Invalid use of surrogate pair code units");
}
return value.ToString();
return StringBuilderCache.GetStringAndRelease(builder);
}
if (c == '\\')
{
Expand All @@ -330,7 +331,7 @@ private string ReadString()
throw reader.CreateException("Invalid use of surrogate pair code units");
}
haveHighSurrogate = char.IsHighSurrogate(c);
value.Append(c);
builder.Append(c);
}
}

Expand Down Expand Up @@ -408,7 +409,8 @@ private void ConsumeLiteral(string text)

private double ReadNumber(char initialCharacter)
{
StringBuilder builder = new StringBuilder();
//builder will not be released in case of an exception, but this is not a problem and we will create new on next Acquire
var builder = StringBuilderCache.Acquire();
if (initialCharacter == '-')
{
builder.Append("-");
Expand Down Expand Up @@ -437,24 +439,25 @@ private double ReadNumber(char initialCharacter)
}

// TODO: What exception should we throw if the value can't be represented as a double?
var builderValue = StringBuilderCache.GetStringAndRelease(builder);
try
{
double result = double.Parse(builder.ToString(),
double result = double.Parse(builderValue,
NumberStyles.AllowLeadingSign | NumberStyles.AllowDecimalPoint | NumberStyles.AllowExponent,
CultureInfo.InvariantCulture);

// .NET Core 3.0 and later returns infinity if the number is too large or small to be represented.
// For compatibility with other Protobuf implementations the tokenizer should still throw.
if (double.IsInfinity(result))
{
throw reader.CreateException("Numeric value out of range: " + builder);
throw reader.CreateException("Numeric value out of range: " + builderValue);
}

return result;
}
catch (OverflowException)
{
throw reader.CreateException("Numeric value out of range: " + builder);
throw reader.CreateException("Numeric value out of range: " + builderValue);
}
}

Expand Down Expand Up @@ -728,6 +731,59 @@ internal InvalidJsonException CreateException(string message)
return new InvalidJsonException(message);
}
}

/// <summary>
/// Provide a cached reusable instance of stringbuilder per thread.
/// Copied from https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/System/Text/StringBuilderCache.cs
/// </summary>
private static class StringBuilderCache
{
private const int MaxCachedStringBuilderSize = 360;
private const int DefaultStringBuilderCapacity = 16; // == StringBuilder.DefaultCapacity

[ThreadStatic]
private static StringBuilder cachedInstance;

/// <summary>Get a StringBuilder for the specified capacity.</summary>
/// <remarks>If a StringBuilder of an appropriate size is cached, it will be returned and the cache emptied.</remarks>
public static StringBuilder Acquire(int capacity = DefaultStringBuilderCapacity)
{
if (capacity <= MaxCachedStringBuilderSize)
{
StringBuilder sb = cachedInstance;
if (sb != null)
{
// Avoid stringbuilder block fragmentation by getting a new StringBuilder
// when the requested size is larger than the current capacity
if (capacity <= sb.Capacity)
{
cachedInstance = null;
sb.Clear();
return sb;
}
}
}

return new StringBuilder(capacity);
}

/// <summary>Place the specified builder in the cache if it is not too big.</summary>
private static void Release(StringBuilder sb)
{
if (sb.Capacity <= MaxCachedStringBuilderSize)
{
cachedInstance = sb;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the bigger of the two here? There's a chance the cached one has bigger capacity than the one being released and we might want to keep the bigger one to reduce the chances of line 758 evaluating to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to not keep too big StringBuilder instances as they are static per thread and can increase total memory consumption of the app. At least this are considerations that people in .net runtime had. As written above the class - this is a copy of internal class used there.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not saying we should keep anything that's above MaxCachedStringBuilderSize. I'm saying we should do something like the following code, to make sure that we keep cache the biggest string builder possible of the ones we have already allocated, within the MaxCachedStringBuilderSize boundaries.

cachedInstance = cachedInstance?.Capacity >= sb.Capacity ? cachedInstance : sb;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check the code in Acquire - it clears cachedInstance and set it to null in happy path.
So when releasing cachedInstance will be null and there is no need to check it's capacity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm refering to this scenario:

  • Acquiere sb of 50 capacity
  • Release sb of 50 capacity => sb of 50 is now cached
  • Acquire sb of 100 capacity => new sb of 100 is created, sb of 50 continues to be cached
  • Release sb of 100 capacity => sb of 100 is discarded, even though it's better to discard the 50 capacity and keep the 100 capacity
  • Acquire sb of 70 capacity => new sb of 70 capacity is created, sb of 50 continues to be cached, we could have reused the sb of 100 capacity

It's probably easy to write a test that demontrates this, unless I'm missunderstanding something. This seems like a simple change that would maximize the benefits of caching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand your concerns and pushed suggested changes. Really nice catch. Maybe you should propose it also in runtime repo.

}
}

/// <summary>ToString() the stringbuilder, Release it to the cache, and return the resulting string.</summary>
public static string GetStringAndRelease(StringBuilder sb)
{
string result = sb.ToString();
Release(sb);
return result;
}
}
}
}
}
Loading