[release/9.0] Update the JSON schema exporter to reuse schemas more aggressively. #108800
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #108764 to release/9.0
/cc @eiriktsarpalis
Customer Impact
A customer reported that generating a JSON schema for
typeof(XElement)
results in explosive memory growth due to the immense complexity of its underlying type graph. This is caused by the fact thatJsonSchemaExporter
uses a very conservative schema reuse policy via the$ref
keyword, reserving it only for cases where a cycle has been detected in the type graph. This in turn results in a high degree of duplication in clique-like type graphs contributing to exponential memory consumption.This PR addresses the issue by applying a more aggressive schema reuse policy: if a schema for a node in the graph has been generated, subsequent occurrences will always produce schemas pointing to the original occurrence regardless of its location. This effectively bounds the size of the generated document to$\mathcal O(n)$ where $n$ denotes the number of .NET members occurring in the type graph.
Regression
Testing
Added unit testing validating the impacted use case.
Risk
Moderate. Makes a last-minute change to the schema generation algorithm which carries risk. Changes what JSON schema is being produced under certain circumstances, even though the new outputs are semantically equivalent to the original outputs.