Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializer support for easier object and collection converters #1562

Closed
steveharter opened this issue Apr 5, 2019 · 11 comments
Closed

Serializer support for easier object and collection converters #1562

steveharter opened this issue Apr 5, 2019 · 11 comments
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Text.Json
Milestone

Comments

@steveharter
Copy link
Member

steveharter commented Apr 5, 2019

Extend the existing converter model for collections and objects. The current converter model (with base class JsonConverter) was intended primarily to support value types, not collections and objects.

Extending the converter model to objects and collections will have the following benefits:

  • Better performance, especially collection serialization.
    • This is due to less boxing on the common struct-based enumerator types such as List<T>.Enumerator and less re-processing of logic including determining what type of collection is being (de)serialized.
  • By making the converter model public it allows others to quickly and easily handle several scenarios not possible today including:
    • Better performance for a converter's Read() method that do not require "read-ahead" for the async+stream cases (explained later).
    • Similarly, future AOT (Ahead-Of-Time compilation) deserialization will not require "read-ahead". Each POCO object would have its own converter.
    • Ability for a custom converter's Write() support an async flush on the underlying stream when a threshold is met.
    • Converter support for the new reference handling feature. The underlying dictionary<$id, object> will be available to the converter.
    • Converter support for obtaining the json path (for deserialization) and the object path (for serialization). Currently this is internal and used to supplement JsonException properties.
    • Ability to have before- and after-callback for objects and collections to allow the serializer to do the bulk of the work and the custom code to perform any logic for serializing additional members, performing defaulting or validation, or adding an element to a collection.
  • A consistent public GetConverter API that allows one converter to forward to another converter.
    • Every serializable property will have a non-null converter obtainable by JsonSerializationOptions.GetConverter(). Currently, the built-in support for objects and collections do not have any converters that are returned from GetConverter() since that logic currently exists in the "main loop" and thus requires a converter that wants to forward to those to manually re-enter the main (De)Serialize methods which is slow and has issues including losing "json path" semantics for exceptions.
  • A loosely-typed (non-generic) mechanism to call a converter. This is important for converters that can't use generics (or call MakeGenericType in order to call a converter). For example, a converter that implements System.Runtime.Serialization.ISerializable. It is also used internally for the root object being returned for the non-generic deserialize\serialize methods.
  • Better understandability and maintainability of the code, leading to higher quality and easier feature implementation.
    • The existing "main loop" (explained later) for objects and collections has been difficult to extend for dictionaries and specialized\immutable collections which required a secondary internal converter model (which would go away with a new converter model). There have been several validation and consistency issues and in general this area needs to be refactored.

Backgound

Currently all internal value-type converters (e.g. String, DateTime, etc) use the same converter model (base class and infrastructure) as custom converters. This means the runtime and extensibility are co-dependent which allows for shared logic and essentially no limit on what can be done within a converter. The existing converter model has proven itself very flexible and performant.

However, currently collections and objects do not use the converter model. Instead they are implemented within a "main loop" consisting of static methods along with state classes.

The state classes (ReadStack, ReadStackFrame, WriteStack, WriteStackFrame) exist to support async-based streams where the call stack may need to unwind in order to read async or flush async from the underling stream, and then continue once that is done. This is done to keep memory requirements low and increase throughput for large streams -- the serializer does not "drain" the stream ahead of time and instead has first-class support for streams and async.

The state classes along with the converter design support shared code for both sync and async support. This prevents having to write both an async and sync converter, for example, and prevents the overhead of using async and passing the stream down to almost every method. This shared code benefit applies to both the runtime and custom converters.

With a new converter model, the state classes will continue to remain for unwind\continuation support, but will also work alongside the normal CLR call stack where there will be ~1 call frame for each level in JSON (for each nested JSON array or object). This makes the code more performant.

A limitation of the existing converter model is that it must "read-ahead" during deserialization to fully populate the buffer up to the end up the current JSON level. This read-ahead only occurs when the async+stream JsonSerializer deserialize methods are called and only when the current JSON for that converter starts with a StartArray or StartObject token. Read-ahead is required because the existing converter design does not support a mechanism to "unwind" (when data is exhausted) and "continue" (when more data is read) so the converter expects all data to be present, and expects that reader.Read() will never return false due to running out of data in the current buffer. If the converter does not start with StartArray or StartObject, then it is assumed the converter will not call reader.Read().

Similarly, a limitation of the existing converter model is that it does not support async flush of the underlying stream for the converter's write methods. Again, this only applies to the async+stream case and only when the converter performs multiple write operations that may hit a threshold. Note that the built-in implementation for object and collections (which do not use converters) do support async flush (and async read) when thresholds are hit, but converters do not.

Proposed API

namespace System.Text.Json.Serialization
{
    // Existing type:
    public abstract class JsonConverter
    {
        // Existing:
        public abstract bool CanConvert(Type typeToConvert);

        // New object-based APIs not requiring generics:
        public virtual bool TryRead(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options, ref ReadStack state, ref object value);
        public virtual bool TryWrite(Utf8JsonWriter writer, object value, JsonSerializerOptions options, ref WriteStack state);
    }

    // Existing type:
    public abstract class JsonConverter<T> : JsonConverter
    {
        // Existing:
        protected internal JsonConverter() { }
        public override bool CanConvert(Type typeToConvert);
        public abstract T Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options);
        public abstract void Write(Utf8JsonWriter writer, T value, JsonSerializerOptions options);

        // New:
        public override sealed bool TryRead(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options, ref ReadStack state, ref object value);
        public override sealed bool TryWrite(Utf8JsonWriter writer, object value, JsonSerializerOptions options, ref WriteStack state);
        public virtual bool TryRead(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options, ref ReadStack state, ref T value);
        public virtual bool TryWrite(Utf8JsonWriter writer, T value, JsonSerializerOptions options, ref WriteStack state);
    }

    // New type:
    public abstract class JsonObjectConverter<T> : JsonConverter<T>
    {
        public override sealed T Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options);
        public override sealed void Write(Utf8JsonWriter writer, T value, JsonSerializerOptions options);
    }

    // New type:
    public abstract class JsonArrayConverter<TCollection, TElement> : JsonConverter<TCollection>
    {
        public override sealed TCollection Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options);
        public override sealed void Write(Utf8JsonWriter writer, TCollection value, JsonSerializerOptions options);
    }
}

The ReadStack* and WriteStack* structs will likely be renamed and have a few members such as obtaining the JsonPath, dictionary for reference handling, and state used for continuation after an async read\flush.

In addition to the above base classes, there will likely be:

  • Additional derived class for an object with callbacks for before-read, before-write, after-read, after-write and "GetMember" to lookup a CLR member based on JSON property name.
  • Additional derived class for a collection with callbacks for before-read, before-write, after-read, after-write and for collection an "Add" method to add an element to the collection.
  • Exposing existing converters and converter factories primarily to support AOT compilation where the initialization phase requires compile-time knowledge of the converter for each property.
@steveharter
Copy link
Member Author

Moving the status of this extensibility model feature to future per discussion @rynowak. Instead we will implement built-in converters for those in System.Collections.Immutable and System.Collections.Generic

@steveharter
Copy link
Member Author

This issue needs to be re-worked to add latest thoughts and extended to support custom deserialization of objects (in addition to enumerable).

In addition, a custom converter for enumerable or object using a new extensibility model would prevent the "read-ahead" performance hit that we currently have where we need to pre-read the JSON in order to ensure Stream-based scenario do not run out of data\JSON during the custom converter.

@steveharter steveharter changed the title Json serializer support for IEnumerable deserialization extensibilty Serializer support for easier object and collection converters Sep 24, 2019
@ericstj ericstj transferred this issue from dotnet/corefx Jan 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@nahk-ivanov
Copy link

It would also be nice to get something similar to old JSON.NET's JsonPropertyAttribute.ItemConverterType, where we could specify converter for just the items in the collection, rather than collection as whole. This would allow us to reuse converters between single object and collection properties, while relying on standard infrastructure to serialize/deserialize collection itself.

@steveharter
Copy link
Member Author

It would also be nice to get something similar to old JSON.NET's JsonPropertyAttribute.ItemConverterType

I assume this is because different collections may need different converters for their elements. Otherwise, you'd just specify a converter for the element type which would be used for both single object and collection elements.

@nahk-ivanov
Copy link

@steveharter, sorry, I don't fully understand what you are trying to say.

There is a difference between

[JsonConverter(typeof(MyObjectsCollectionJsonConverter))]
public IEnumerable<MyObject> CollectionOfObjects { get; set; }

and

[JsonProperty(ItemConverterType = typeof(MyObjectJsonConverter))]
public IEnumerable<MyObject> CollectionOfObjects { get; set; }

First converter is expected to return an IEnumerable<MyObject>, while the second one will be creating instances of MyObject directly.

If I'm not missing any APIs - first one is possible and second one is not today. What I was trying to say is that second scenario is very convenient, as it allows you to reuse one converter for an object and collection of objects. You can typically do it by registering the converter for type externally with the serializer, but if MyObject is a string, then it's not possible.

@steveharter
Copy link
Member Author

steveharter commented Jan 28, 2020

Today the converter for MyObject can be specified by placing [JsonConverter] on that type, or by calling JsonSerializationOptions.Converters.Add(myConverter) . The converter will get called for every MyObject whether in a collection, another property or root.

I'm assuming this wouldn't work for your scenario?

@nahk-ivanov
Copy link

You can typically do it by registering the converter for type externally with the serializer, but if MyObject is a string, then it's not possible.

@Tragetaschen
Copy link
Contributor

Will the introduction of internal collection converters help against the boxing of array elements?

I am sending new int[1024] to the Blazor client which calls JsonSerializer.Serialize internally

var data = new int[1024];
for (var i = 0; i < 30; ++i)
    JsonSerializer.Serialize(data);

and was quite shocked to see that in 3.1, this allocates 24kiB alone by boxing all 1024 elements individually for each invocation. This easily dwarfs all other allocations including those for the string holding the result.

Luckily it is rather simple in my code base to wrap the arrays in a type with one property annotated with [JsonConverter(…)].

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Feb 20, 2020
@layomia layomia added this to the 5.0 milestone Feb 20, 2020
@layomia layomia added the api-needs-work API needs work before it is approved, it is NOT ready for implementation label Feb 20, 2020
@steveharter
Copy link
Member Author

@Tragetaschen

Will the introduction of internal collection converters help against the boxing of array elements?

Yes array perf was one of the main drivers for this refactoring. Currently 5.0 (master) will no longer box array elements or the array enumerator.

Using your example but wrapping in a loop of 100 in 3.1 generates:

Type	Allocations
 - System.Int32	3,072,000
 - System.String	3,185
 - System.Text.Json.Utf8JsonWriter	3,000
 - System.Text.Json.PooledByteBufferWriter	3,000
 - System.SZArrayEnumerator	3,000

GC	Collected
 - 1	217688
 - 2	222985
 - 3	223326
 - 4	223326
 - 5	223326
 - 6	223322
 - 7	223326
 - 8	223326
 - 9	223326
 - 10	223322
 - 11	223326
 - 12	223362
 - 13	223326

5.0 generates:

Type	Allocations
 - System.String	3,141
 - System.Text.Json.Utf8JsonWriter	3,000
 - System.Text.Json.PooledByteBufferWriter	3,000

GC	Collected
 - 1	4757
 - 2	4392

Also CPU perf is improved with large collections. Serializing a single 10,000 element Int32 array:

  • 13.1 seconds in 3.1
  • 3.3 seconds in 5.0

@steveharter
Copy link
Member Author

steveharter commented May 15, 2020

@nahk-ivanov

What I was trying to say is that second scenario is very convenient, as it allows you to reuse one converter for an object and collection of objects. You can typically do it by registering the converter for type externally with the serializer, but if MyObject is a string, then it's not possible.

I see. You would like the ability to specify the converter for a given element type in a collection.

The serializer design supports this in a way since a collection converter knows about its converter for the underlying element type, but it always obtains it from the global set of converters. But currently it doesn't support an attribute or a run-time mechanism to specify the converter type for an element of a particular collection type or a specific property.

Without a feature to address this there are two ways to work-around this limitation:

  1. In your example you'd need to create a converter for
    public IEnumerable<MyObject> CollectionOfObjects { get; set; }
    and another one for
    public IEnumerable<string> CollectionOfObjects { get; set; }

  2. Create a factory converter for your collection type (or just IEnumerable<T>) where that converter derives from JsonConverterFactory and overrides CreateConverter to create a new converter instance for each T using Type.MakeGenericType() or by returning a "known converter" based on the element type. This is what the framework does internally and an example is provided here for Dictionary<string,TValue>. So there are 3 converters with this:

    • The factory converter that creates the collection converter and closes the T.
    • The collection converter that forwards to the element converter either by obtaining the converter obtained through options.GetConverter() or by re-entering the serializer by calling the (de)serialize methods that take a reader\writer.
    • The already-existing converter for the element type.

@steveharter
Copy link
Member Author

Closing as the refactoring work was performed in #2259. As mentioned in that PR, a new issue will be created for any new APIs or features that layer on this refactoring.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

5 participants