Serializer support for easier object and collection converters #1562

steveharter · 2019-04-05T21:44:10Z

Extend the existing converter model for collections and objects. The current converter model (with base class JsonConverter) was intended primarily to support value types, not collections and objects.

Extending the converter model to objects and collections will have the following benefits:

Better performance, especially collection serialization.
- This is due to less boxing on the common struct-based enumerator types such as List<T>.Enumerator and less re-processing of logic including determining what type of collection is being (de)serialized.
By making the converter model public it allows others to quickly and easily handle several scenarios not possible today including:
- Better performance for a converter's Read() method that do not require "read-ahead" for the async+stream cases (explained later).
- Similarly, future AOT (Ahead-Of-Time compilation) deserialization will not require "read-ahead". Each POCO object would have its own converter.
- Ability for a custom converter's Write() support an async flush on the underlying stream when a threshold is met.
- Converter support for the new reference handling feature. The underlying dictionary<$id, object> will be available to the converter.
- Converter support for obtaining the json path (for deserialization) and the object path (for serialization). Currently this is internal and used to supplement JsonException properties.
- Ability to have before- and after-callback for objects and collections to allow the serializer to do the bulk of the work and the custom code to perform any logic for serializing additional members, performing defaulting or validation, or adding an element to a collection.
A consistent public GetConverter API that allows one converter to forward to another converter.
- Every serializable property will have a non-null converter obtainable by JsonSerializationOptions.GetConverter(). Currently, the built-in support for objects and collections do not have any converters that are returned from GetConverter() since that logic currently exists in the "main loop" and thus requires a converter that wants to forward to those to manually re-enter the main (De)Serialize methods which is slow and has issues including losing "json path" semantics for exceptions.
A loosely-typed (non-generic) mechanism to call a converter. This is important for converters that can't use generics (or call MakeGenericType in order to call a converter). For example, a converter that implements System.Runtime.Serialization.ISerializable. It is also used internally for the root object being returned for the non-generic deserialize\serialize methods.
Better understandability and maintainability of the code, leading to higher quality and easier feature implementation.
- The existing "main loop" (explained later) for objects and collections has been difficult to extend for dictionaries and specialized\immutable collections which required a secondary internal converter model (which would go away with a new converter model). There have been several validation and consistency issues and in general this area needs to be refactored.

Backgound

Currently all internal value-type converters (e.g. String, DateTime, etc) use the same converter model (base class and infrastructure) as custom converters. This means the runtime and extensibility are co-dependent which allows for shared logic and essentially no limit on what can be done within a converter. The existing converter model has proven itself very flexible and performant.

However, currently collections and objects do not use the converter model. Instead they are implemented within a "main loop" consisting of static methods along with state classes.

The state classes (ReadStack, ReadStackFrame, WriteStack, WriteStackFrame) exist to support async-based streams where the call stack may need to unwind in order to read async or flush async from the underling stream, and then continue once that is done. This is done to keep memory requirements low and increase throughput for large streams -- the serializer does not "drain" the stream ahead of time and instead has first-class support for streams and async.

The state classes along with the converter design support shared code for both sync and async support. This prevents having to write both an async and sync converter, for example, and prevents the overhead of using async and passing the stream down to almost every method. This shared code benefit applies to both the runtime and custom converters.

With a new converter model, the state classes will continue to remain for unwind\continuation support, but will also work alongside the normal CLR call stack where there will be ~1 call frame for each level in JSON (for each nested JSON array or object). This makes the code more performant.

A limitation of the existing converter model is that it must "read-ahead" during deserialization to fully populate the buffer up to the end up the current JSON level. This read-ahead only occurs when the async+stream JsonSerializer deserialize methods are called and only when the current JSON for that converter starts with a StartArray or StartObject token. Read-ahead is required because the existing converter design does not support a mechanism to "unwind" (when data is exhausted) and "continue" (when more data is read) so the converter expects all data to be present, and expects that reader.Read() will never return false due to running out of data in the current buffer. If the converter does not start with StartArray or StartObject, then it is assumed the converter will not call reader.Read().

Similarly, a limitation of the existing converter model is that it does not support async flush of the underlying stream for the converter's write methods. Again, this only applies to the async+stream case and only when the converter performs multiple write operations that may hit a threshold. Note that the built-in implementation for object and collections (which do not use converters) do support async flush (and async read) when thresholds are hit, but converters do not.

Proposed API

namespace System.Text.Json.Serialization
{
    // Existing type:
    public abstract class JsonConverter
    {
        // Existing:
        public abstract bool CanConvert(Type typeToConvert);

        // New object-based APIs not requiring generics:
        public virtual bool TryRead(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options, ref ReadStack state, ref object value);
        public virtual bool TryWrite(Utf8JsonWriter writer, object value, JsonSerializerOptions options, ref WriteStack state);
    }

    // Existing type:
    public abstract class JsonConverter<T> : JsonConverter
    {
        // Existing:
        protected internal JsonConverter() { }
        public override bool CanConvert(Type typeToConvert);
        public abstract T Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options);
        public abstract void Write(Utf8JsonWriter writer, T value, JsonSerializerOptions options);

        // New:
        public override sealed bool TryRead(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options, ref ReadStack state, ref object value);
        public override sealed bool TryWrite(Utf8JsonWriter writer, object value, JsonSerializerOptions options, ref WriteStack state);
        public virtual bool TryRead(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options, ref ReadStack state, ref T value);
        public virtual bool TryWrite(Utf8JsonWriter writer, T value, JsonSerializerOptions options, ref WriteStack state);
    }

    // New type:
    public abstract class JsonObjectConverter<T> : JsonConverter<T>
    {
        public override sealed T Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options);
        public override sealed void Write(Utf8JsonWriter writer, T value, JsonSerializerOptions options);
    }

    // New type:
    public abstract class JsonArrayConverter<TCollection, TElement> : JsonConverter<TCollection>
    {
        public override sealed TCollection Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options);
        public override sealed void Write(Utf8JsonWriter writer, TCollection value, JsonSerializerOptions options);
    }
}

The ReadStack* and WriteStack* structs will likely be renamed and have a few members such as obtaining the JsonPath, dictionary for reference handling, and state used for continuation after an async read\flush.

In addition to the above base classes, there will likely be:

Additional derived class for an object with callbacks for before-read, before-write, after-read, after-write and "GetMember" to lookup a CLR member based on JSON property name.
Additional derived class for a collection with callbacks for before-read, before-write, after-read, after-write and for collection an "Add" method to add an element to the collection.
Exposing existing converters and converter factories primarily to support AOT compilation where the initialization phase requires compile-time knowledge of the converter for each property.

The text was updated successfully, but these errors were encountered:

steveharter · 2019-04-12T19:53:59Z

Moving the status of this extensibility model feature to future per discussion @rynowak. Instead we will implement built-in converters for those in System.Collections.Immutable and System.Collections.Generic

steveharter · 2019-09-19T21:24:01Z

This issue needs to be re-worked to add latest thoughts and extended to support custom deserialization of objects (in addition to enumerable).

In addition, a custom converter for enumerable or object using a new extensibility model would prevent the "read-ahead" performance hit that we currently have where we need to pre-read the JSON in order to ensure Stream-based scenario do not run out of data\JSON during the custom converter.

nahk-ivanov · 2020-01-15T20:26:35Z

It would also be nice to get something similar to old JSON.NET's JsonPropertyAttribute.ItemConverterType, where we could specify converter for just the items in the collection, rather than collection as whole. This would allow us to reuse converters between single object and collection properties, while relying on standard infrastructure to serialize/deserialize collection itself.

steveharter · 2020-01-28T18:20:12Z

It would also be nice to get something similar to old JSON.NET's JsonPropertyAttribute.ItemConverterType

I assume this is because different collections may need different converters for their elements. Otherwise, you'd just specify a converter for the element type which would be used for both single object and collection elements.

nahk-ivanov · 2020-01-28T19:14:50Z

@steveharter, sorry, I don't fully understand what you are trying to say.

There is a difference between

[JsonConverter(typeof(MyObjectsCollectionJsonConverter))]
public IEnumerable<MyObject> CollectionOfObjects { get; set; }

and

[JsonProperty(ItemConverterType = typeof(MyObjectJsonConverter))]
public IEnumerable<MyObject> CollectionOfObjects { get; set; }

First converter is expected to return an IEnumerable<MyObject>, while the second one will be creating instances of MyObject directly.

If I'm not missing any APIs - first one is possible and second one is not today. What I was trying to say is that second scenario is very convenient, as it allows you to reuse one converter for an object and collection of objects. You can typically do it by registering the converter for type externally with the serializer, but if MyObject is a string, then it's not possible.

steveharter · 2020-01-28T19:20:41Z

Today the converter for MyObject can be specified by placing [JsonConverter] on that type, or by calling JsonSerializationOptions.Converters.Add(myConverter) . The converter will get called for every MyObject whether in a collection, another property or root.

I'm assuming this wouldn't work for your scenario?

nahk-ivanov · 2020-01-28T19:28:25Z

You can typically do it by registering the converter for type externally with the serializer, but if MyObject is a string, then it's not possible.

Tragetaschen · 2020-02-03T16:57:09Z

Will the introduction of internal collection converters help against the boxing of array elements?

I am sending new int[1024] to the Blazor client which calls JsonSerializer.Serialize internally

var data = new int[1024];
for (var i = 0; i < 30; ++i)
    JsonSerializer.Serialize(data);

and was quite shocked to see that in 3.1, this allocates 24kiB alone by boxing all 1024 elements individually for each invocation. This easily dwarfs all other allocations including those for the string holding the result.

Luckily it is rather simple in my code base to wrap the arrays in a type with one property annotated with [JsonConverter(…)].

steveharter · 2020-02-27T17:08:27Z

@Tragetaschen

Will the introduction of internal collection converters help against the boxing of array elements?

Yes array perf was one of the main drivers for this refactoring. Currently 5.0 (master) will no longer box array elements or the array enumerator.

Using your example but wrapping in a loop of 100 in 3.1 generates:

Type	Allocations
 - System.Int32	3,072,000
 - System.String	3,185
 - System.Text.Json.Utf8JsonWriter	3,000
 - System.Text.Json.PooledByteBufferWriter	3,000
 - System.SZArrayEnumerator	3,000

GC	Collected
 - 1	217688
 - 2	222985
 - 3	223326
 - 4	223326
 - 5	223326
 - 6	223322
 - 7	223326
 - 8	223326
 - 9	223326
 - 10	223322
 - 11	223326
 - 12	223362
 - 13	223326

5.0 generates:

Type	Allocations
 - System.String	3,141
 - System.Text.Json.Utf8JsonWriter	3,000
 - System.Text.Json.PooledByteBufferWriter	3,000

GC	Collected
 - 1	4757
 - 2	4392

Also CPU perf is improved with large collections. Serializing a single 10,000 element Int32 array:

13.1 seconds in 3.1
3.3 seconds in 5.0

steveharter · 2020-05-15T17:02:32Z

@nahk-ivanov

What I was trying to say is that second scenario is very convenient, as it allows you to reuse one converter for an object and collection of objects. You can typically do it by registering the converter for type externally with the serializer, but if MyObject is a string, then it's not possible.

I see. You would like the ability to specify the converter for a given element type in a collection.

The serializer design supports this in a way since a collection converter knows about its converter for the underlying element type, but it always obtains it from the global set of converters. But currently it doesn't support an attribute or a run-time mechanism to specify the converter type for an element of a particular collection type or a specific property.

Without a feature to address this there are two ways to work-around this limitation:

In your example you'd need to create a converter for
public IEnumerable<MyObject> CollectionOfObjects { get; set; }
and another one for
public IEnumerable<string> CollectionOfObjects { get; set; }
Create a factory converter for your collection type (or just IEnumerable<T>) where that converter derives from JsonConverterFactory and overrides CreateConverter to create a new converter instance for each T using Type.MakeGenericType() or by returning a "known converter" based on the element type. This is what the framework does internally and an example is provided here for Dictionary<string,TValue>. So there are 3 converters with this:
- The factory converter that creates the collection converter and closes the T.
- The collection converter that forwards to the element converter either by obtaining the converter obtained through options.GetConverter() or by re-entering the serializer by calling the (de)serialize methods that take a reader\writer.
- The already-existing converter for the element type.

steveharter · 2020-05-15T17:18:14Z

Closing as the refactoring work was performed in #2259. As mentioned in that PR, a new issue will be created for any new APIs or features that layer on this refactoring.

steveharter assigned layomia Apr 5, 2019

steveharter unassigned layomia Apr 12, 2019

ahsonkhan assigned steveharter Sep 19, 2019

steveharter changed the title ~~Json serializer support for IEnumerable deserialization extensibilty~~ Serializer support for easier object and collection converters Sep 24, 2019

ericstj transferred this issue from dotnet/corefx Jan 9, 2020

Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jan 9, 2020

Tarmil mentioned this issue Jan 25, 2020

JsonException path, position, error message etc. incorrect for records Tarmil/FSharp.SystemTextJson#37

Open

steveharter mentioned this issue Jan 27, 2020

New serializer converter model for objects and collections #2259

Merged

phizch mentioned this issue Feb 1, 2020

System.Text.Json.Serialization: Converters are ignored for types derived from supported base types #30619

Closed

layomia removed the untriaged New issue has not been triaged by the area owner label Feb 20, 2020

layomia added this to the 5.0 milestone Feb 20, 2020

layomia added the api-needs-work API needs work before it is approved, it is NOT ready for implementation label Feb 20, 2020

layomia mentioned this issue Feb 20, 2020

System.Text.Json: How to serialize only certain properties without using JsonIgnoreAttribute #593

Closed

This was referenced Feb 20, 2020

Developers apps using JSON serialization start up and run faster #1568

Closed

Add serializer tests for value types dotnet/performance#1211

Merged

layomia mentioned this issue Mar 11, 2020

JsonSerializerOptions.IgnoreNullValues=true causes JsonConverter override to never be called #33369

Closed

layomia mentioned this issue Mar 19, 2020

Roadblocks when writing custom JsonConverter instances #2202

Closed

layomia mentioned this issue Apr 17, 2020

IgnoreNullValues does not work with non-nullable value types #30795

Closed

jozkee mentioned this issue May 6, 2020

Proposal: Add mechanism to handle circular references when serializing #30820

Closed

steveharter closed this as completed May 15, 2020

steveharter mentioned this issue May 20, 2020

More extensible object and collection converters #36785

Closed

layomia mentioned this issue Jul 23, 2020

In JsonConverter, .Skip() throws, but .TrySkip() never returns false during DeserializeAsync #39795

Closed

ghost locked as resolved and limited conversation to collaborators Dec 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serializer support for easier object and collection converters #1562

Serializer support for easier object and collection converters #1562

steveharter commented Apr 5, 2019 •

edited

Loading

steveharter commented Apr 12, 2019

steveharter commented Sep 19, 2019

nahk-ivanov commented Jan 15, 2020

steveharter commented Jan 28, 2020

nahk-ivanov commented Jan 28, 2020

steveharter commented Jan 28, 2020 •

edited

Loading

nahk-ivanov commented Jan 28, 2020

Tragetaschen commented Feb 3, 2020

steveharter commented Feb 27, 2020

steveharter commented May 15, 2020 •

edited

Loading

steveharter commented May 15, 2020

Serializer support for easier object and collection converters #1562

Serializer support for easier object and collection converters #1562

Comments

steveharter commented Apr 5, 2019 • edited Loading

Backgound

Proposed API

steveharter commented Apr 12, 2019

steveharter commented Sep 19, 2019

nahk-ivanov commented Jan 15, 2020

steveharter commented Jan 28, 2020

nahk-ivanov commented Jan 28, 2020

steveharter commented Jan 28, 2020 • edited Loading

nahk-ivanov commented Jan 28, 2020

Tragetaschen commented Feb 3, 2020

steveharter commented Feb 27, 2020

steveharter commented May 15, 2020 • edited Loading

steveharter commented May 15, 2020

steveharter commented Apr 5, 2019 •

edited

Loading

steveharter commented Jan 28, 2020 •

edited

Loading

steveharter commented May 15, 2020 •

edited

Loading