-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal for enhanced dataflow analysis #988
Changes from 1 commit
cf009d8
69c55f6
0f9d0e9
6e7e143
e897582
417ad0e
acd3597
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,189 @@ | ||
# Reflection and IL Linker in .NET 5 | ||
|
||
Unconstrained reflection in .NET presents a challenge for discovering parts of .NET apps that are (or are not) used. We had multiple attempts to solve the problem – probably the most complex one was in .NET Native. .NET Native was trying to make arbitrary reflection "just work" by using a combination of: | ||
|
||
* cross-method dataflow analysis framework (that could e.g. model the `dynamic` keyword in C#, or reason about cascading reflection patterns like `Type.GetType("Foo").GetMethod("Bar").ReturnType.GetMethod("Baz")`) | ||
* expressive annotation format (RD.XML) that could express things like "keep all properties and constructors on types passed as a parameter to this method, recursively". The idea was that library developers would annotate their libraries and things would "just work" for the user. | ||
|
||
Even with the level of investment in .NET Native it wasn't possible to make arbitrary reflection "just work" – before shipping, we had to make a decision to not do any tree-shaking on user assemblies by default because the reflection patterns were often (~20% of the Store app catalog) arbitrary enough that it wasn't possible to describe them in generic ways or detect them. The .NET Native compiler did not warn the user about presence of unrecognized patterns either because we expected there would be too much noise due to the disconnect between RD.XML (that simply described what to keep) and the dataflow analysis (that focused on reflection API usage). | ||
|
||
## Linker-safe reflection | ||
|
||
Note: While this document mostly talks about reflection, this includes reflection-like APIs with impact on linker's analysis such as `RuntimeHelpers.GetUninitializedObject`, `Marshal.PtrToStructure`, or `RuntimeHelpers.RunClassConstructor`. | ||
|
||
In .NET 5, we would like to carve out a _subset_ of reflection patterns that can be made safe in the presence of IL linker. Since IL linking is optional, users do not have to adhere to this subset. They can still use IL Linker if they do not adhere to this subset, but we will not provide guarantees that linking won't change the semantics of their app. Linker will warn if a reflection pattern within the app is not safe. | ||
|
||
To achieve safety, we'll logically classify methods into following categories: | ||
|
||
* Linker-safe: most code will fall into this category. Linking can be done safely based on information in the static callgraph. | ||
* Potentially unsafe: call to the method is unsafe if the linker cannot reason about a parameter value (e.g. `Type.GetType` with a type name string that could be unknown) | ||
* Always unsafe: calls to these methods are never safe (e.g. `Assembly.ExportedTypes`). | ||
|
||
Explicit non-goals of this proposal: | ||
|
||
* Provide a mechanism to solve reflection based serializer (and similar patterns) | ||
* Provide a mechanism to solve dependency injection patterns | ||
|
||
It is our belief that _linker-friendly_ serialization and dependency injection would be better solved by source generators. | ||
|
||
## Analyzing calls to potentially unsafe methods | ||
|
||
The most interesting category to discuss are the "potentially unsafe" methods: reasoning about a parameter to a method call requires being able to trace the value of the parameter through the method body. IL linker is currently capable of doing this in a limited way. We'll be expanding this functionality further so that it can cover patterns like: | ||
|
||
```csharp | ||
Type t; | ||
if (isNullable) | ||
{ | ||
t = typeof(NullableAccessor); | ||
} | ||
else | ||
{ | ||
t = typeof(ObjectAccessor); | ||
} | ||
|
||
// Linker should infer we need the default constructor for the 2 types above | ||
Activator.CreateInstance(t); | ||
// Linker should infer we need method Foo on the 2 types above | ||
var mi = t.GetMethod("Foo"); | ||
// Linker should infer we need field Bar on the 2 types above | ||
var fi = t.GetMethod("Bar"); | ||
MichalStrehovsky marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
In an ideal world, this would be the extent of the reflection that can be made safe by the linker. Such small subset would not be practical because reflection often happens across method bodies. Instead of introducing cross-method dataflow analysis, we'll help linker reason about reflection happening across method bodies with annotations. | ||
|
||
## Cross-method annotations | ||
|
||
To document reflection use across methods, we'll introduce a new attribute `DynamicallyAccessedMembersAttribute` that can be attached to method parameters, the method return parameter, fields, and properties (whose type is `System.Type`, or `System.String`). The attribute will provide additional metadata related to linker-safety of the parameter or field. | ||
|
||
(When the attribute is applied to a location of type `System.String`, the assumption is that the string represents a fully qualified type name.) | ||
|
||
```csharp | ||
public sealed class DynamicallyAccessedMembersAttribute : Attribute | ||
{ | ||
public DynamicallyAccessedMembersAttribute(MemberKinds memberKinds) | ||
{ | ||
MemberKinds = memberKinds; | ||
} | ||
|
||
public MemberKinds MemberKinds { get; } | ||
} | ||
|
||
[Flags] | ||
public enum MemberKinds | ||
{ | ||
DefaultConstructor = 1, | ||
PublicConstructors = 3, // Maps to BindingFlags.Public | ||
Constructors = 7, // Maps to BindingFlags.Public | BindingFlags.NonPublic | ||
PublicProperties = 0x10, | ||
Properties = 0x30, | ||
PublicMethods = 0x40, | ||
Methods = 0x70, | ||
NestedTypes = 0x80, | ||
// ... | ||
} | ||
``` | ||
|
||
When a method or field is annotated with this attribute, two things will happen: | ||
* The method/field becomes potentially linker-unsafe. Linker will ensure that the values logically written to the annotated location (i.e. passed as a parameter, returned from the method, written to the field) can be statically reasoned about, or a warning will be generated. | ||
* The analysis of the value read from the annotated location will have richer information available and the linker can assume that linker-unsafe operations with an otherwise unknown value could still be safe. | ||
|
||
Example: | ||
|
||
```csharp | ||
class Program | ||
{ | ||
[DynamicallyAccessedMembers(MemberKinds.Methods)] | ||
private static readonly Type _otherType; | ||
|
||
static Program() | ||
{ | ||
// _otherType is marked DynamicallyAccessedMembers - the linker | ||
// ensures that the value written to this can be statically reasoned about | ||
// and meets the MemberKinds restriction. So it will keep all methods on Foo. | ||
_otherType = Type.GetType("Foo, FooAssembly"); | ||
} | ||
|
||
static void Main() | ||
{ | ||
// SAFE: _otherType was annotated as "some type that has all methods kept" | ||
_otherType.GetMethod("RandomMethod"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you elaborate on this example more or point to where the pattern is used in CoreLib so I can understand the need a little better? On the surface this pattern could become always safe by putting the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure which pattern in CoreLib had @MichalStrehovsky in mind, but Now the way the code is currently written can't be directly annotated with the proposed attributes (since it actually doesn't store the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I put Of course they could cache the But the building block (being able to annotate a field) is necessary to get custom attributes that have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Would this refactoring make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Our linker would detect and correctly mark all of the reflection calls in You are right that Thinking out loud here. If we detected the type that is assigned to the field, then we could store that knowledge. Then when we come across the I guess what I'm dancing around is I'm trying to figure out is the pros/cons of this proposal vs attributes such as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In that case we can store just the Type - it will seem redundant since the T is already there, but if we call it "typeToCreate" or something along those lines it should be OK. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Storing Type would be better, but it still not clear how it would be done without introducing regression. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Mark the Or, We can implement the extension Vitek added as a TODO to the doc: class Lazy<[DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)] T>
{
// ...
} This would ensure the The drawback of the second approach is that now every use of void Frob<T>()
{
// This is now unsafe because the T is not marked [DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)]
// To fix it, we need to mark the T in Frob too
Lazy<T> lazy = ...
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that is a down side. We have users doing things that would fall into the "you should use source generators" category. Many of them want something that is better than nothing which is where I like the double checking that the connection allows, but I'm wondering if the fact that the connection can be made means that the linker can just auto detect these things. @MichalStrehovsky Is there a use case where the connection is maintained but the linker cannot auto detect the usage? As I mentioned in my previous comment, the example you pointed to could be automatically detected I believe. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To some degree it's a tradeoff between complexity in the linker and explicit behavior in the API. For example, in some cases linker may be able to "propagate" these annotations to callers (over several jumps) - but then if one of the callers doesn't satisfy the necessary requirement (for example it passes in a Type which can't be figured out) the warning message would be relatively confusing as there would be no notion of it in the source code. The warning would say something like "The parameter As for cases which we can't hope the linker to figure out: Casts (to and from |
||
} | ||
} | ||
``` | ||
|
||
(The above pattern exists in CoreLib and is currently unsafe.) | ||
|
||
|
||
TODO: Creating a delegate to a potentially linker unsafe method could be solvable. Reflection invoking a potentially linker unsafe method is hard. Both out of scope? | ||
TODO: It might be possible to apply similar pattern to generic parameters. The DynamizallAccessedMembers could be added to the generic parameter declaration and linker could make sure that when it's "assigned to" the requirements are met. | ||
|
||
## Linker unsafe annotations | ||
|
||
Another annotation will be used to mark methods that are never linker safe: | ||
|
||
```csharp | ||
public sealed class LinkerUnsafeAttribute : Attribute | ||
{ | ||
public LinkerUnsafeAttribute(string message) | ||
{ | ||
Message = message; | ||
} | ||
|
||
public string Message { get; } | ||
} | ||
``` | ||
|
||
All calls to methods annotated with LinkerUnsafe will result in a warning and the linker will not analyze linker-safety within the method body or the parts of the static callgraph that are only reachable through this method. | ||
|
||
TODO: Do we care about localization issues connected with the message string? Do we need an enum with possible messages ("this is never safe", "use different overload", "use different method", etc.) This is probably the same bucket as `ObsoleteAttribute`. | ||
|
||
## Escape hatch: DynamicDependencyAttribute annotation | ||
|
||
This is an existing custom attribute (known as `PreserveDependencyAttribute`) understood by the linker. This attribute allows the user to declare the type name, method/field name, and signature (all as a string) of a method or field that the method dynamically depends on. | ||
|
||
When the linker sees a method/constructor/field annotated with this attribute as necessary, it also marks the referenced member as necessary. It also suppresses all analysis within the method. | ||
See issue https://github.com/dotnet/runtime/issues/30902 for details. | ||
|
||
The attribute can also be used as an escape to indicate "shut up, this is safe" to IL linker in edge cases that cannot be captured by the analysis by having the `DynamicDependencyAttribute` refer to something that is always needed (e.g. the decorated method itself). | ||
|
||
## Case study: Custom attributes | ||
|
||
Custom attributes are going to be analyzed same way as method calls – the act of applying a custom attribute is a method call to the attribute constructor. The act of setting a property is a call to the property setter. | ||
|
||
Linker currently special cases the `TypeConverterAttribute` to make sure we always keep the default constructor of the type referenced by the attribute. With the proposed annotations, it's possible to express what's necessary without the special casing. It will also make it possible for the consuming code to safely pass the value of `TypeConverterAttribute.ConverterTypeName` to `Type.GetType`/`Activator.CreateInstance`, since the location is annotated as known to keep the type and the default constructor. | ||
|
||
```csharp | ||
public sealed class TypeConverterAttribute : Attribute | ||
{ | ||
public TypeConverterAttribute() | ||
{ | ||
ConverterTypeName = string.Empty; | ||
} | ||
|
||
public TypeConverterAttribute([DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)] Type type) | ||
{ | ||
if (type == null) | ||
{ | ||
throw new ArgumentNullException(nameof(type)); | ||
} | ||
|
||
// SAFE: assigning AssemblyQualifiedName of a type that is known to have default ctor available | ||
ConverterTypeName = type.AssemblyQualifiedName!; | ||
} | ||
|
||
public TypeConverterAttribute([DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)] string typeName) | ||
{ | ||
if (typeName == null) | ||
{ | ||
throw new ArgumentNullException(nameof(typeName)); | ||
} | ||
|
||
// SAFE: assigning a known type name string | ||
ConverterTypeName = typeName; | ||
} | ||
|
||
[DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)] | ||
public string ConverterTypeName { get; } | ||
} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this example to be an interesting one.
Consider
typeof(int)
is easily seen by the linker.Assembly
is easy. and so isExportedTypes
. The linker could mark all of the exported types. The behavior of the linked assemblies would match the original program.So the linker could correctly mark all of the dependencies of the reflection API, however, doing so would result in over preserving in most real world use cases since I imagine inside that loop there would normally be some sort of filtering happening on the types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's confusing to call anything just linker-unsafe - linker could keep the entire app and everything would be safe. So maybe a better way is to say that we choose not to support the pattern in the linker as we're not able to achieve good trimming results with it.
Code patterns like this example are typical in Dependency Injection (or similar) scenarios - I think that we can potentially provide different ways of solving that problem. But that's outside of this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to talk about this linker-compatible/incompatible or linker-friendly/unfriendly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safe/unsafe meaning typically relates to security. This has not much to do with security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been using linker-friendly in some places already - so I'd be happy with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, but it negates a lot of benefits of the linking (one unfortunate API call and suddenly none of the public types get linked away). It doesn't seem to be worth the effort to make this work.
The problem is also that once we have the type back, people are going to do other things with it (not just count them). So we either need to understand
IEnumerable<Type>
or just keep all members to be sure.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"safe/unsafe meaning typically relates to security" I agree that it's good to not pollute the use of this convention, even though that's all it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree, we wouldn't want to support the case of assembly getting types.
However, what about
typeof(Foo).GetFields()
? in our linker we will mark all of the fields ofFoo
if we see this. Our rational was, if you are getting the fields of the specific type you probably care about those fields and the odds of an undesired ripple effect are much lower. I will say I was/am on the fence about it. I'm curious what stance you guys take.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping all fields on
Foo
in this case would align with how member access is treated in the proposedMemberKinds
enum - type-level granularity feels more acceptable than assembly level granularity.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current proposal is to mark this with
AllFields
which would get us basically the same behavior as you already have. I personally think that we should be very careful about including more types than "necessary", but I'm much more comfortable including more members.