-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Developers can use System.Text.Json to serialize type hierarchies securely #63747
Comments
Tagging subscribers to this area: @dotnet/area-system-text-json Issue DetailsBackground and MotivationSerializing type hierarchies (aka polymorphic serialization) has been a long requested feature by the community (cf. #29937, #30083). We took a stab at producing a polymorphism feature during the .NET 6 development cycle, ultimately though the feature was cut since we eventually reached the conclusion that unconstrained polymorphism does not meet our security bar, under any circumstance. For context, polymorphic serialization has long been associated with security vulnerabilities. More specifically, unconstrained polymorphic serialization can result in accidental data disclosure and unconstrained polymorphic deserialization can result in remote code execution when used with untrusted input. See the BinaryFormatter security guide for an explanation of such vulnerabilities. We acknowledge that serializing type hierarchies is an important feature, and we are committed at delivering a secure implementation in a future release of System.Text.Json. This implies that we will be releasing a brand of polymorphism that is restricted by design, requiring users to explicitly opt-in to supported subtypes. ExamplesAt the core of the design is the introduction of [JsonTypeDiscriminator("$type")]
[JsonKnownType(typeof(Derived1), "derived1")]
[JsonKnownType(typeof(Derived2), "derived2")]
public class Base
{
public int X { get; set; }
}
public class Derived1 : Base
{
public int Y { get; set; }
}
public class Derived2 : Base
{
public int Z { get; set; }
} This allows roundtrippable polymorphic serialization using the following schema: var json1 = JsonSerializer.Serialize<Base>(new Derived1()); // { "$type" : "derived1", "X" : 0, "Y" : 0 }
var json2 = JsonSerializer.Serialize<Base>(new Derived2()); // { "$type" : "derived2", "X" : 0, "Z" : 0 }
JsonSerializer.Deserialize<Base>(json1) is Derived1; // true
JsonSerializer.Deserialize<Base>(json2) is Derived2; // true Type discriminators can optionally be omitted from the contract like so: [JsonKnownType(typeof(Derived1))]
[JsonKnownType(typeof(Derived2))]
public class Base
{
public int X { get; set; }
} resulting in the following serializations: var json1 = JsonSerializer.Serialize<Base>(new Derived1()); // { "X" : 0, "Y" : 0 }
var json2 = JsonSerializer.Serialize<Base>(new Derived2()); // { "X" : 0, "Z" : 0 } Note however that the particular contract does not support deserialization. See also #30083 (comment) for details on the most recent API proposal. Goals
Anti-Goals
Open Questions
Progress
|
I'm curious for more explanation by the open question "Should we support polymorphic serialization for collection types?". If it simply means "Should we allow users to serialize a collection of objects of various types that all derive from IType?" (e.g. List) , then I can't possibly imagine a scenario where the answer would be no. |
That's right. The issue primarily is that collections serialize as JSON arrays. As such introducing a type discriminator would serve to complicate the contract, i.e. it would likely require nesting the array payload inside an object with metadata: { "$type" : "derivedList", "$values" : [1,2,3,4,5] } There is precedent for such encodings in the reference preservation feature. |
I completely understand these arguments and decisions, but I think it's a shame that this option is taken away from the developer. An opt-in to serializing open hierarchies would have quickly bridged a large gap from Newtonsoft.Json (from what I've seen in other comments) and could prove useful in some diagnostic situations. For the above examples, would it be too much to also support an inverse of the attribute: [JsonTypeDiscriminator("$type")]
public class Base
{
public int X { get; set; }
}
[JsonKnownDerivedType("derived2")]
public class Derived1 : Base
{
public int Y { get; set; }
} This provides a clean solution for when you don't own the |
It does feel a bit insulting to have a much-requested feature completely disallowed simply because people might misuse it. Even if we had to toss it in an unsafe block, it would be nice to have simple does-what-it-says-on-the-box JSON serialization/deserialization for the (very common) case where security is not a concern. I've used JSON as a quick, simple data interchange format (most recently to store hierarchical scene data for rendering software) more times than I can count, and not having to rely on Newtonsoft.JSON would be great for those cases. |
Such an attribute would likely need to explicitly specify the base type, since otherwise that would be ambiguous: was it meant for In any case, it should be possible to work around not owning the base type by specifying the hierarchy via |
We acknowledge and understand the frustration that folks have with open hierarchies being disallowed. Our intent with publishing these goals and anti-goals is to help bring clarity to expectations and engage in these conversations. This should help us find a viable path forward together.
The challenge with open hierarchies isn't that people might misuse it, it's that any use of it is insecure. There are scenarios in which that vulnerability is masked by defenses elsewhere in a system, including the "I wrote this code and it's only ever going to run on my machine with my own data" scenario, but we cannot design this feature set with reliance on outside defenses. We must approach this design knowing that foundational security is paramount and then enable as many scenarios as we can with features that are still usable. We want to be clear about how we're approaching the problem space. By designing with security first, we can keep building new capabilities outward to incrementally shrink the set of scenarios that cannot be accomplished. But if we start with features that are vulnerable by design, then we'll inevitably reach a point where it's difficult to use the features securely. That was the fate of We want to hear more about scenarios where open hierarchies have been used, and explore approaches for addressing those scenarios without relying on open hierarchies as an implementation detail. Thank you for helping us keep this moving forward! |
And, for what it's worth, it should still be possible to implement unconstrained polymorphism using custom converters. I've seen a few examples like that on the internet (including a couple of threads in this repo), although I wouldn't want to share links to them here. |
Yes, I originally wrote
Can you show an example? This sounds like (unless I've misunderstood) requiring the types to all be registered when the serializer is configured (e.g., in an OWIN startup). Secondarily related, would this structure work to expose the necessary in the derived type? [JsonTypeDiscriminator("$type")]
[JsonKnownType(typeof(IExposeY), "derived1")] // Serialize based on interface; acknowledging that deserialization is impossible
public class Base
{
public int X { get; set; }
}
public class Derived1 : Base, IExposeY
{
public int Y { get; set; } // From IExposeY
} |
But we don't want hand crafted solutions. If we would, the request wouldn't have so many up votes where people are telling you (and everyone posting his custom solution) this over and over again. The problem is rather simple: If the JSON .NET team make any change in the future because they are not aware of everyone's custom solution, and this change breaks this custom solution. But this custom solution has been used for so long that so many JSON (files for example) already exist. Then we have the worst case scenario ever. As a developer we want reliability, we want something where we can be sure it also works in 5 years or more. And we want it simply, something to opt-in. Something that works everywhere the same. No matter which developer wrote the code for it. Newtonsoft JSON provides it. System.Text.JSON does not. And since .NET 3.1 we are asking for it and now that it has been delayed for so many times telling us to use custom converters is a bit odd. If that was ever an option, we would have used it already. But we are not, we are waiting for an official solution. Please keep this in mind, thank you. |
Just to be absolutely clear, there will not be an officially supported way to do unconstrained polymorphic serialization in the foreseeable future. We are always open to offering better extensibility points that make such features easier to build as extensions, but the onus for doing this will always be on the developer/ecosystem. |
This is an especially annoying situation for those of us on the desktop app side of the equation who use JSON as a data storage or round-trip save format. The only security concern to us, at least as far as I understand it, is that someone could alter an on-disk JSON file that our software has generated that could do something malicious when loaded back into the system. If this is indeed the case (and please correct me if it's not), then there are already myriad data integrity validation tools at our disposal, including cryptographic hashing and singing, that make this concern a non issue. In every scenario where I want to serialize and deserialize JSON, I am in control of the data from end to end. Ignoring this use case in favor of only addressing the concerns of anonymous web transfers is ignoring a large chunk of the equation. No one is suggesting that security not be the prime concern by default, but if every library design approached its domain problem with the idea that developers should be absolutely disallowed from getting ourselves into potential trouble, even if it's opt-in, nothing would ever get done in the field of software development. Please reconsider this stance. Many of us don't require this level of hand holding. |
I agree with the fact that unconstrained polymorphism is a security issue, and as far as I understand, the only real counter-arguments here are that it worked before and that it's easy to dump process memory and load it back. If it can be done with the proper extensions method, and that it's as useful as you say, I'm sure there will be an open-source package that will fill the gap, without tempting new devs that don't understand the security implications of this. What do you think? |
This seems logical. As S.T.J necessarily evolves slowly, an open source extensions package it is a natural idea. S.T.J should provide good enough extension points to let this package be built. Because the package is not by MS, MS cannot rightly be blamed for the vulns it introduces. |
Am I wrong in my assumtion that the issue being spoken about arises not from uncostrained polymorphism per se, but from the fact that traditionally multiple popular libraries provide the serialization facilities in ways that do not enforce class invariants and de-facto do binary deserialization, and also from the common issue (as I subjectively see it) of providing loose invariants throughout the ecosystem? On topic tl;dr: as an optional mode, open hierarchies may be allowed if the library is set to use constructors. |
Can you clarify a bit more what you mean by class invariants? |
There are already open source packages which provide polymorphism support. But I found they don't work in all cases or aren't updated anymore. This is what can(!) happen to all third party packages. If you rely on them and got thousands of JSON files and it doesn't work anymore and you have to use another package perhaps and all of your previous JSON becomes invalid.. good luck. Not to say that any further updates to System.Text.Json can break it as well. I don't think this an appropriated solution for a serious developer/company but that's just my opinion. |
@eiriktsarpalis I can understand that the class that provides public setters with no validation for properties of non specific types (for example, the class uses signed integers to model natural numbers) can be insecurely deserialized even without polymorphism, but I fail to imagine an example of a class that 1) would be insecure to recreate on deserialization through constructor, and 2) would not have other obvious design issues. |
A question how configurating polymorphism for non-owned types work in case of metadata source gen? AFAIK it was suggested that custom contract resolver will NOT be picked up by source gen path rendering |
Good question. There currently is no way make the source generator opt-in non-owned types into polymorphism, but it should still be possible to define a custom contract resolver deriving from JsonSerializerContext: [JsonSerializable(typeof(NonOwnedBaseType))]
[JsonSerializable(typeof(NonOwnedDerivedType))]
public MyContext : JsonSerializerContext
{
}
public class MyContractResolver : IJsonTypeInfoResolver
{
public JsonTypeInfo GetTypeInfo(Type type, JsonSerializerOptions options)
{
JsonTypeInfo sourceGenTypeInfo = ((IJsonTypeInfoResolver)MyContext.Default).GetTypeInfo(type, options);
if (type == typeof(NonOwnedBaseType))
{
sourceGenTypeInfo.PolymorphismOptions = /* add polymorphism metadata as usual */
}
return sourceGenTypeInfo;
}
}
var options = new JsonSerializerOptions { TypeInfoResolver = new MyContractResolver() };
JsonSerializer.Serialize(new NonOwnedDerivedType(), options); In the future we might consider exposing polymorphism configuration options via the |
@eiriktsarpalis hmm if sourcegen can actually pick up contracts and afaik public contract design with polymorphism is still in flight maybe there should be public static (or singleton) Filling of static/singleton options could be done in generated module initializer or require custom contracts to be partial for source gen and generate population there (maybe in constructor or The idea is to ensure theres no two separate ways for reflection and source gen serialization to enable non-owned types in polymorphism, which would ideally be achieved by fully sharing contract's interface between reflection and sourcegen mode However if u dont want to invest into filling this gap for NET 7 then you should document somewhere this discrepancy and maybe even edit blog post about NET 7 preview 5 to reflect actual support https://devblogs.microsoft.com/dotnet/announcing-dotnet-7-preview-5/ EDIT: thinking about it further i might misunderstood you (english isnt my main language sorry). Did you mean only scenario where we want to enable non-owned types without custom contract isnt supported? But we can still add them manually via custom contract and avoiding custom contract would be enabled by future |
One limitation I noticed with the current implementation, it seems that If you're using For example: var databaseState = @"{
""BaseDictionary"": {
""6ed6e524-2ca4-4fea-8e21-7245ccb61863"": {
""Id"": ""6ed6e524-2ca4-4fea-8e21-7245ccb61863"",
""Name"": ""Something"",
""$type"": ""Derived"",
""OtherGuid"": ""d471c77d-5412-4e7a-a98d-8304e87792ed""
}
}
}";
JsonSerializer.Deserialize<WrapperType>(databaseState);
public record WrapperType(Dictionary<Guid, WrapperType.Base> BaseDictionary)
{
[JsonDerivedType(typeof(Derived), nameof(Derived))]
[JsonDerivedType(typeof(AlsoDerived), nameof(AlsoDerived))]
public abstract record Base(Guid Id, string Name);
public record Derived(Guid Id, string Name, Guid OtherGuid): Base(Id, Name);
public record AlsoDerived(Guid Id, string Name) : Base(Id, Name);
} Results in:
Newtonsoft had a |
@Hawxy ugh, is that the STJ behaviour? That seems broken. |
@Hawxy yes, this is a known limitation of the JSON metadata reader implementation (also used in features like reference preservation). Without getting into too much detail, it boils down to the fact that the internal converters have been written with async streaming in mind, as such the type discriminator might not even have been loaded in memory when the first properties are being read. In the future we might consider exposing an optional flag that works around this, at the expense of reading the same JSON twice and buffering excessive data in memory. In the meantime, I think it might be reasonable to throw a better error message in cases where the base type is abstract and no type discriminators have been detected at the start of the object. |
@eiriktsarpalis Could you answer my question i made in edit? Need to understand actual limitations since im thinking of moving away from Newtonsoft to STJ and last obstacles are polymorphism and publicly exposed contract (well and support for private members at least just fields but i think i will be able to workaround it)
|
The only difference between source gen and reflection is that the former generates metadata at compile-time while the latter uses reflection. The core serialization logic is agnostic as to the source of metadata so if |
@eiriktsarpalis in that case i was indeed under wrong impression and everything is okay with |
One thing that my library appears to have over this is compile time build errors for enum valued type discriminators.. not sure how this proposal handles enums here, the samples rely on ints and magic strings |
I like the idea, but after reading this I have a few questions, because Im not sure if / what will be implemented of the following things:
The JsonTypeInfo examples posted above seem like a good option for that case, because then I can collect all "valid" types myself, even at runtime, and just add the info, but Im not sure if this is / will actually be added or was that just an example? Ok now for the dumb question; I still don't fully understand the security issue, if I add a type property to the json like "Type" : "MyDerivedType" and the property is of type "MyBaseType", how is this a issue? It will find MyDerivedType and deserialize it, but only if it is an actual derived type of "MyBaseType" (it's easy to check). So it shouldn't be a issue, because MyDerivedType will only exist and be derived from MyBaseType if I added it to my code, it would only be an issue if I allow additional dlls to be loaded at runtime (but this has many other security issues anyways if not done with care) and someone adds a "bad MyEvilDerivedType" and then adds the type name in the json file, but unless I allow loading of additional dlls at runtime I don't see the issue? Wouldn't it be a much simpler solution to just add a property to the (de)serializer settings with a list of "safe derived type" assembilies and allow the system to use all derived types (and interface implementations) of the defined assemblies without hard coding boilerplate code with every possible dervied type to the base type? |
Attribute annotations will not work for generic derived types, such configuration will need to happen using the contract resolver model. See the "Configuring Polymorphism via the Contract model" section in the OP.
Likewise, authoring a custom contract resolver should let you define polymorphic hierarchies programmatically.
It becomes an issue when "MyBaseType" is high enough in the type hierarchy (e.g.
That's precisely what contract customization would be making, only configuration would be scoped to polymorphic types rather than the assembly level. |
Thank you for the answers @eiriktsarpalis, I must have skipped the contract model section before. I checked the example and it seems to be exactly what I need. I currently have a custom json serializer which collects all "valid" derived types from a specific "safe" assembly, which contains all the checks to find interface implementations and generic types etc. So it should be possible to simply change my implementaton to this the new feature and feed the custom contract model with my already existing code to collect the types. |
Thank you for the awesome work! I just copied one of the code snippets: `[JsonPolymorhic(UnknownDerivedTypeHandling = JsonUnknownDerivedTypeHandling.FallBackToNearestAncestor)] public class TestClass : MyDerivedClass { } JsonSerializer.Serialize(new TestClass()); // serializes using the contract for `MyDerivedClass`` At the first line there is a missing 'p', you might want to fix it. The typo is actually in more of those snippets. Thanks again for this update, will use it heavily. |
The new model that you have introduced seems like a big improvement and is a very welcome addition! One thing I was wondering and cannot find an answer to is how a type discriminator would behave if we will have the exact same field that we want to use in a runtime: Example: [JsonPolymorphic(CustomTypeDiscriminatorPropertyName = "Type")]
[JsonDerivedType(typeof(Base), typeDiscriminatorId: "base")]
[JsonDerivedType(typeof(Derived), typeDiscriminatorId: "derived")]
public class Base
{
public string Type {get; }
public int X { get; set; }
}
public class Derived : Base
{
public int Y { get; set; }
} What will happen if the Maybe this is really an anti-pattern to do it this way, and it is best to just check the C# type, but I have seen many places where code is using some additional Type field for various checks. |
@ilya-scale that's a good question actually. It would seem like the metadata writer does not check for property name conflicts even though the metadata reader does and will throw an exception. I think it might make sense to introduce a fix so that some form of validation takes place on the serialization side as well. I'll fork this into a separate issue. |
I have also found what seems to be a bug in the implementation and not sure if I should open a new github issue or not (since this is a preview package), I can just post it here perhaps: It seems that the deserialization of the types is dependent that the descriminator type is the first field in Json. If I create json myself (i.e. not using System.Text.Json to serialize it first) and put the property not on top, then I get an exception:
While it is probably not a big issue, the json is order agnostic as far as I know and it should work anyway. |
@ilya-scale this is a known limitation. See #63747 (comment) |
@eiriktsarpalis Is there an issue that tracks the above $type ordering limitation? |
There isn't one. Would you like to create it? |
Background and Motivation
Serializing type hierarchies (aka polymorphic serialization) has been a feature long requested by the community (cf. #29937, #30083). We took a stab at producing a polymorphism feature during the .NET 6 development cycle, ultimately though the feature was cut since we eventually reached the conclusion that unconstrained polymorphism does not meet our security bar, under any circumstance.
For context, polymorphic serialization has long been associated with security vulnerabilities. More specifically, unconstrained polymorphic serialization can result in accidental data disclosure and unconstrained polymorphic deserialization can result in remote code execution when used with untrusted input. See the BinaryFormatter security guide for an explanation of such vulnerabilities.
We acknowledge that serializing type hierarchies is an important feature, and we are committed to delivering a secure implementation in a future release of System.Text.Json. This implies that we will be releasing a brand of polymorphism that is restricted by design, requiring users to explicitly opt-in to supported subtypes.
Basic Polymorphism
At the core of the design is the introduction of the
JsonDerivedType
attribute:This configuration enables polymorphic serialization for
Base
, specifically when the runtime type isDerived
:Note that this does not enable polymorphic deserialization since the payload would roundtripped as
Base
:Polymorphism using Type Discriminators
To enable polymorphic deserialization, users need to specify a type discriminator for the derived class:
Which will now emit JSON along with type discriminator metadata:
which can be used to deserialize the value polymorphically:
Type discriminator identifiers can also be integers, so the following form is valid:
Mixing and matching configuration
It is possible to mix and match type discriminator configuration for a given type hierarchy
resulting in the following serializations:
Customizing Type Discriminators
It is possible to customize the property name of the type discriminator metadata like so:
resulting in the following JSON:
Unknown Derived Type Handling
Consider the following type hierarchy:
Since the configuration does not explicitly opt-in support for
DerivedType2
, attempting to serialize instances ofDerivedType2
asBase
will result in a runtime exception:The default behavior can be tweaked using the
JsonUnknownDerivedTypeHandling
enum, which can be specified like so:The
FallBackToNearestAncestor
setting can be used to fall back to the contract of the nearest declared derived type:It should be noted that falling back to the nearest ancestor admits the possibility of diamond ambiguity:
Configuring Polymorphism via the Contract model
For use cases where attribute annotations are impractical or impossible (large domain models, cross-assembly hierarchies, hierarchies in third-party dependencies, etc.), it should still be possible to configure polymorphism using the JSON contract model:
Additional details
JsonDerivedType
attribute. Undeclared runtime types will result in a runtime exception. The behavior can be changed by configuring theJsonPolymorphicAttribute.UnknownDerivedTypeHandling
property.API Proposal
Anti-Goals
Progress
The text was updated successfully, but these errors were encountered: