Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to json serialization without adding support for precomputed caches #5983

Closed
wants to merge 43 commits into from

Conversation

Forgind
Copy link
Member

@Forgind Forgind commented Dec 21, 2020

We need to move off BinaryFormatter.

Test of basic functionality (creates simple cache in json and reads it) passed.

Forgind added 30 commits July 14, 2020 16:03
This gives RAR the ability to save off a modified state file and load it for later use.

Still to do:
1) Verify that the precomputed cache read in is real.
2) Change logging such that it will log messages instead of errors or warnings if there are problems with deserializing the precomputed cache
3) Comments at relevant points
4) Switch serialization mode to another form (json?)
5) Validation + performance tests
6) Have SDK opt in
It was previously serializable via BinaryFormatter. This makes it serializable via System.Text.Json's JsonConverter.
Add support for serializing SystemState using System.Text.Json rather than the older BinaryFormatter.
… just when serializing and deserializing the precomputed cache
I have not yet measured how long it takes to read a MVID vs. reading and parsing its references, nor have I made unit tests.
...although they don't work yet
The version of Arcade MSBuild currently uses uses Roslyn version 3.3.1. This updates that, but it should be removed when we've updated to a more modern version of arcade.
@Forgind Forgind marked this pull request as draft December 21, 2020 18:24
@Forgind Forgind marked this pull request as ready for review December 21, 2020 19:31
Comment on lines 3 to 8
<PropertyGroup>
<NuGetPackageVersion>5.9.0-preview.1.6870</NuGetPackageVersion>
<NuGetBuildTasksVersion Condition="'$(NuGetBuildTasksVersion)' == ''">$(NuGetPackageVersion)</NuGetBuildTasksVersion>
<NuGetCommandsVersion Condition="'$(NuGetCommandsVersion)' == ''">$(NuGetPackageVersion)</NuGetCommandsVersion>
<NuGetProtocolVersion Condition="'$(NuGetProtocolVersion)' == ''">$(NuGetPackageVersion)</NuGetProtocolVersion>
</PropertyGroup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bad merge that's overwriting NuGet updates.

@@ -986,13 +986,15 @@
</ItemGroup>
<ItemGroup>
<PackageReference Include="System.Collections.Immutable" />
<PackageReference Include="System.Reflection.Metadata" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using this in full framework now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Among the things that (presumably) was useful at some stage in serialization, though possibly just for BinaryFormatter-serialized caches.

<PackageReference Include="System.Resources.Extensions" />
</ItemGroup>

<!-- Tasks need to mimic redistributing the compilers, so add references to both full framework and .net core -->
<ItemGroup>
<!-- Reference this package to get binaries at runtime even when Arcade is not adding compiler references -->
<PackageReference Include="Microsoft.Net.Compilers.Toolset" ExcludeAssets="all" Condition="'$(UsingToolMicrosoftNetCompilers)' == 'false'" />
<PackageReference Include="System.Text.Json" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a normal reference it should go in the PackageReference group above. The comments don't apply to normal references.

{
if (!string.IsNullOrEmpty(_stateFile) && _cache.IsDirty)
{
_cache.SerializeCache(_stateFile, Log);
var deserializeOptions = new JsonSerializerOptions() { Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So . . . the Unsafe part of this name sure stands out to me. Is it necessary? Is it safe? If so, comment an explanation of why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, and I think so. If I remember correctly, this has to do with whether characters are escaped, and this specifies they don't need escaping, but almost everything in the cache is ultimately a number, a string, a bool, or a version, so other random characters should be flagged as "wrong type" outside the string, which should accept them without running them as code or immediately reject them depending on the circumstance. I don't remember which characters were being serialized incorrectly, but something was.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to know this for sure, and with comments in the source explaining.

_cache.SerializeCache(_stateFile, Log);
var deserializeOptions = new JsonSerializerOptions() { Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping };
deserializeOptions.Converters.Add(new SystemState.Converter());
File.WriteAllText(_stateFile, JsonSerializer.Serialize<SystemState>(_cache, deserializeOptions));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call the overload that takes a stream so we don't have to materialize the possibly-LOH full string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no. Not available in .NET Framework. If you want to switch to .net 5...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not available in .NET Framework.

What specifically isn't? I see

public static ValueTask<TValue?> DeserializeAsync<TValue>(Stream utf8Json, JsonSerializerOptions? options = null, CancellationToken cancellationToken = default(CancellationToken))

in the net461 decompilation.

If you want to switch to .net 5

We are doing so in #5515.

Copy link
Member Author

@Forgind Forgind Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's really just that the documentation doesn't accurately reflect the frameworks it's available for:

image

image

@@ -64,9 +64,10 @@ internal virtual void SerializeCache(string stateFile, TaskLoggingHelper log)
/// <summary>
/// Reads the specified file from disk into a StateFileBase derived object.
/// </summary>
internal static StateFileBase DeserializeCache(string stateFile, TaskLoggingHelper log, Type requiredReturnType)
internal static T DeserializeCache<T>(string stateFile, TaskLoggingHelper log, bool logWarnings = true) where T : StateFileBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this method look good but I don't understand why they're being made. Aren't we getting rid of BinaryFormatter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, yes, but StateFileBase is used for other classes, too. I did some cleanup while I was looking at it.

src/Shared/AssemblyNameExtension.cs Show resolved Hide resolved
/// Construct.
/// </summary>
internal SystemState()
internal sealed class Converter : JsonConverter<SystemState>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you go into a fair amount of detail on why this fairly-custom converter is here? Is it required for functionality? Is it a perf improvement? Will it be necessary after dotnet/runtime#1568 lands? Why choose the explicit-converter approach over the JsonConverter attribute approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: haven't looked at this part because waiting on 👆🏻

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started out with a not-customized implementation, had some difficulty getting everything to be serialized, and asked for help. You told me to write a customer serializer. For the first part of that, see 8b3b613.

I think you suggested this is more secure, serializes everything, and hopefully runs faster than using a default converter.

/// <summary>
/// Flag that indicates
/// </summary>
/// <value></value>
internal bool IsDirty
{
get { return isDirty; }
set { isDirty = value; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a setter here feels wrong. Generally dirtiness is a result of mutating something and not directly manipulable. Is that not true here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true, but this is very useful in the test I deleted. (It's among the "not actually necessary but present" changes I mentioned.) I'd prefer not to bother with deleting and readding them.

@@ -540,7 +669,7 @@ out fileState.frameworkName

dependencies = fileState.dependencies;
scatterFiles = fileState.scatterFiles;
frameworkName = fileState.frameworkName;
frameworkName = fileState.FrameworkNameAttribute;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why go through the getter when we have direct access?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter? The getter just returns it.

I think I tried to eliminate the field but found it was necessary and put it back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put it all the way back. Unnecessary churn like this makes reviews harder and generally hurts understandability of the codebase.

{
if (!string.IsNullOrEmpty(_stateFile) && _cache.IsDirty)
{
_cache.SerializeCache(_stateFile, Log);
var deserializeOptions = new JsonSerializerOptions() { Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to know this for sure, and with comments in the source explaining.

Copy link
Member

@benvillalobos benvillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

{
log.LogMessageFromResources("General.CouldNotReadStateFileMessage", stateFile, log.FormatResourceString("General.IncompatibleStateFileType"));
retVal = null;
log.LogMessageFromResources("General.CouldNotReadStateFile", stateFile, log.FormatResourceString("General.IncompatibleStateFileType"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a slightly different codepath now? The older version checked retVal._serializedVersion != CurrentSerializationVersion before logging this, should we check the same here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what you mean? It looks to me like the retVal._serializedVersion != CurrentSerializationVersion check is after logging a warning (always) in the old code. The changes here are:

  1. The conditions were mutually exclusive --> switch to else if
  2. Make function parameterized instead of explicitly taking a type as an argument
  3. Give the option of not logging warnings

throw new JsonException();
}
string parameter = reader.GetString();
reader.Read();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this capturing exactly? Could you comment the first use of reader.Read() in this context?

Copy link
Member Author

@Forgind Forgind Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably does deserve a comment, since it's kinda overloaded.

The Json reader is set up such that every thing (technical term) of interest has its own type. In particular, when specifying that a particular property has a particular value, it has the "PropertyName" token followed by the token(s) representing the property. If that property is not null, it will start with a "StartObject" token if it's an object or a "StartArray" token if it's an array, or just have a value we read below. If it's null, it just starts with null, and there isn't anything else.

In this case, I read the first token. If it's null, that means that the json has something like "propertyName"=null, which is valid though avoided unless you modify it by hand. Having the null check you commented on means that we don't keep reading the object if it happens to be null. GetString() would have returned a valid property name on the ANE that wasn't set.

If the property isn't null, it skips over the "StartObject", etc. and lets us start reading the interesting stuff immediately or parses the token as appropriate. Does that make sense?

}
string parameter = reader.GetString();
reader.Read();
if (reader.TokenType == JsonTokenType.Null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean when we see null here? What would we have just read from GetString() if we fall into this case?

@Forgind Forgind closed this Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants