Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal for enhanced dataflow analysis #988

Merged
merged 7 commits into from
Apr 30, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 223 additions & 0 deletions docs/design/reflection-flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# Reflection and IL Linker in .NET 5

Unconstrained reflection in .NET presents a challenge for discovering parts of .NET apps that are (or are not) used. We had multiple attempts to solve the problem – probably the most complex one was in .NET Native. .NET Native was trying to make arbitrary reflection "just work" by using a combination of:

* cross-method dataflow analysis framework (that could e.g. model the `dynamic` keyword in C#, or reason about cascading reflection patterns like `Type.GetType("Foo").GetMethod("Bar").ReturnType.GetMethod("Baz")`)
* expressive annotation format (RD.XML) that could express things like "keep all properties and constructors on types passed as a parameter to this method, recursively". The idea was that library developers would annotate their libraries and things would "just work" for the user.

Even with the level of investment in .NET Native it wasn't possible to make arbitrary reflection "just work" – before shipping, we had to make a decision to not do any tree-shaking on user assemblies by default because the reflection patterns were often (~20% of the Store app catalog) arbitrary enough that it wasn't possible to describe them in generic ways or detect them. The .NET Native compiler did not warn the user about presence of unrecognized patterns either because we expected there would be too much noise due to the disconnect between RD.XML (that simply described what to keep) and the dataflow analysis (that focused on reflection API usage).

## Linker-friendly reflection

Note: While this document mostly talks about reflection, this includes reflection-like APIs with impact on linker's analysis such as `RuntimeHelpers.GetUninitializedObject`, `Marshal.PtrToStructure`, or `RuntimeHelpers.RunClassConstructor`.

In .NET 5, we would like to carve out a _subset_ of reflection patterns that can be made compatible in the presence of IL linker. Since IL linking is optional, users do not have to adhere to this subset. They can still use IL Linker if they do not adhere to this subset, but we will not provide guarantees that linking won't change the semantics of their app. Linker will warn if a reflection pattern within the app is not compatible.

To achieve compatibility, we'll logically classify methods into following categories:

* Linker-friendly: most code will fall into this category. Linking can be done safely based on information in the static callgraph.
* Potentially unfriendly: call to the method is unsafe if the linker cannot reason about a parameter value (e.g. `Type.GetType` with a type name string that could be unknown)
* Always unfriendly: calls to these methods are never safe in the presence of linker (e.g. `Assembly.ExportedTypes`).

Explicit non-goals of this proposal:

* Provide a mechanism to solve reflection based serializer (and similar patterns)
* Provide a mechanism to solve dependency injection patterns

It is our belief that _linker-friendly_ serialization and dependency injection would be better solved by source generators.

## Analyzing calls to potentially unfriendly methods

The most interesting category to discuss are the "potentially unfriendly" methods: reasoning about a parameter to a method call requires being able to trace the value of the parameter through the method body. IL linker is currently capable of doing this in a limited way. We'll be expanding this functionality further so that it can cover patterns like:

```csharp
Type t;
if (isNullable)
{
t = typeof(NullableAccessor);
}
else
{
t = typeof(ObjectAccessor);
}

// Linker should infer we need the default constructor for the 2 types above
Activator.CreateInstance(t);
// Linker should infer we need method Foo on the 2 types above
var mi = t.GetMethod("Foo");
// Linker should infer we need field Bar on the 2 types above
var fi = t.GetField("Bar");
```

In an ideal world, this would be the extent of the reflection that can be made safe by the linker. Such small subset would not be practical because reflection often happens across method bodies. Instead of introducing cross-method dataflow analysis, we'll help linker reason about reflection happening across method bodies with annotations.

## Cross-method annotations

To document reflection use across methods, we'll introduce a new attribute `DynamicallyAccessedMembersAttribute` that can be attached to method parameters, the method return parameter, fields, and properties (whose type is `System.Type`, or `System.String`). The attribute will provide additional metadata related to linker-friendliness of the parameter or field.

(When the attribute is applied to a location of type `System.String`, the assumption is that the string represents a fully qualified type name.)

```csharp
public sealed class DynamicallyAccessedMembersAttribute : Attribute
{
public DynamicallyAccessedMembersAttribute(DynamicallyAccessedMemberTypes memberKinds)
{
MemberTypes = memberTypes;
}

public DynamicallyAccessedMemberTypes MemberTypes { get; }
}

[Flags]
public enum DynamicallyAccessedMemberTypes
{
DefaultConstructor = 0x0001,
PublicConstructors = 0x0002 | DefaultConstructor,
NonPublicConstructors = 0x0004,
// ...
}
```

When a method or field is annotated with this attribute, two things will happen:
* The method/field becomes potentially linker-friendly. Linker will ensure that the values logically written to the annotated location (i.e. passed as a parameter, returned from the method, written to the field) can be statically reasoned about, or a warning will be generated.
* The analysis of the value read from the annotated location will have richer information available and the linker can assume that linker-unfriendly operations with an otherwise unknown value could still be safe.

Example:

```csharp
class Program
{
[DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicMethods)]
private static readonly Type _otherType;

static Program()
{
// _otherType is marked DynamicallyAccessedMembers - the linker
// ensures that the value written to this can be statically reasoned about
// and meets the MemberKinds restriction. So it will keep all methods on Foo.
_otherType = Type.GetType("Foo, FooAssembly");
}

static void Main()
{
// SAFE: _otherType was annotated as "some type that has all methods kept"
_otherType.GetMethod("RandomMethod");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this example more or point to where the pattern is used in CoreLib so I can understand the need a little better?

On the surface this pattern could become always safe by putting the GetType and GetMethod together.

Type.GetType("Foo, FooAssembly").GetMethod("RandomMethod");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which pattern in CoreLib had @MichalStrehovsky in mind, but Lazy<T> almost fits this as well. Some of the .ctors which don't take a valueCreator effectively need the T to have a default .ctor (and be instantiatable). Internally it's implemented by eventually calling the Activator.CreateInstance on it.

Now the way the code is currently written can't be directly annotated with the proposed attributes (since it actually doesn't store the Type instead it stores a boolean and relies on the T), but it could be refactored relatively easily to fit the pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put GetMethod, this this one is more interesting with Activator.CreateInstance. E.g. here:

https://github.com/dotnet/runtime/blob/0a2f0e6da98a784601b37d19417b930a33b2c478/src/coreclr/src/System.Private.CoreLib/src/Internal/Runtime/InteropServices/ComActivator.cs#L610

Of course they could cache the ConstructorInfo instead, eliminating the need for the annotation.

But the building block (being able to annotate a field) is necessary to get custom attributes that have System.Type properties or constructor arguments working. I would like to make those work because they're pretty common.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be refactored relatively easily to fit the pattern.

Would this refactoring make Lazy<T> more expensive?

Copy link
Contributor

@mrvoorhe mrvoorhe Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our linker would detect and correctly mark all of the reflection calls in LicenseInteropProxy() without the help of any annotations. So those are OK.

You are right that object licContext = Activator.CreateInstance(_licInfoHelper)!; is the tricky part.

Thinking out loud here. If we detected the type that is assigned to the field, then we could store that knowledge. Then when we come across the Activator.CreateInstance having the field passed to it we could mark the ctor of all types that had been assigned to that field. So I think we could actually automatically handle this scenario.

I guess what I'm dancing around is I'm trying to figure out is the pros/cons of this proposal vs attributes such as [RequiredMember] that I've discussed here #797 (comment). We are accumulating support for a variety of these attributes to start trying them out. I don't want to introduce things that would conflict with the wider .NET ecosystem but at the same time our use cases are slightly different which is why I'm trying to understand exactly what scenarios you are trying to solve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we can store just the Type - it will seem redundant since the T is already there, but if we call it "typeToCreate" or something along those lines it should be OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing Type would be better, but it still not clear how it would be done without introducing regression. Lazy<T> is perf critical e.g. for FSharp, and great deal of work went into making it as lean as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Lazy<T> would be solvable by one of two ways:

Mark the Lazy<T> constructor overload that makes it call into CreateInstance<T> as LinkerUnsafe & mark the CreateViaDefaultConstructor() as "linker shut up, this is safe".

Or,

We can implement the extension Vitek added as a TODO to the doc:

class Lazy<[DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)] T>
{
    // ...
}

This would ensure the Activator.CreateInstance<T> within lazy is not marked linker unsafe.

The drawback of the second approach is that now every use of Lazy<T> from generic code needs to be annotated because the annotations are contagious, even though the default constructor might not be needed if a factory delegate was provided:

void Frob<T>()
{
    // This is now unsafe because the T is not marked [DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)]
    // To fix it, we need to mark the T in Frob too
    Lazy<T> lazy = ...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with annotations that just say "keep this" is that they don't have a connection to the specific reflection API usage within the method - they cannot be used in validating that the actual usage is safe. The linker either has to trust that the annotation on the method itself

Yes, I agree that is a down side. We have users doing things that would fall into the "you should use source generators" category. Many of them want something that is better than nothing which is where [RequireMember] style attributes come in.

I like the double checking that the connection allows, but I'm wondering if the fact that the connection can be made means that the linker can just auto detect these things. @MichalStrehovsky Is there a use case where the connection is maintained but the linker cannot auto detect the usage? As I mentioned in my previous comment, the example you pointed to could be automatically detected I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To some degree it's a tradeoff between complexity in the linker and explicit behavior in the API. For example, in some cases linker may be able to "propagate" these annotations to callers (over several jumps) - but then if one of the callers doesn't satisfy the necessary requirement (for example it passes in a Type which can't be figured out) the warning message would be relatively confusing as there would be no notion of it in the source code. The warning would say something like "The parameter inputType of GetSomeData requires a known type with all its fields" but the method GetSomeData would have nothing on it which would hint at that fact. Making the annotations explicit means that the behavior is to some degree self-documenting.

As for cases which we can't hope the linker to figure out: Casts (to and from object specifically), collections, and so on. In these cases the annotations could be used to hint the linker what's going on, but I don't think the linker could figure it out on its own.

}
}
```

(The above pattern exists in CoreLib and is currently unfriendly.)

More details discussed in https://github.com/dotnet/runtime/issues/33861.

TODO: Creating a delegate to a potentially linker unfriendly method could be solvable. Reflection invoking a potentially linker unfriendly method is hard. Both out of scope?
TODO: It might be possible to apply similar pattern to generic parameters. The DynamizallAccessedMembers could be added to the generic parameter declaration and linker could make sure that when it's "assigned to" the requirements are met.

## Intrinsic recognition of reflection APIs

While it would be possible to annotate reflection primitives with the proposed DynamicallyAccessedMembers attribute, linker is going to intrinsically recongnize some of the reflection primitives so that it can do a better job at preserving just the pieces that are really needed. For example:

* `Type.GetMethod`
* `Type.GetProperty`
* `Type.GetField`
* `Type.GetMember`
* `Type.GetNestedType`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are GetDeclaredMethod, etc. intentionally missing in this list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I added this as a response to Vitek's comment - it's not trying to be an exhaustive list. We already e.g. special case methods on System.Linq.Expressions, so this will be a pretty long list. I'll add "for example" because trying to keep this up-to-date would be a vain effort.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Are going ..." reads like a question, but it is not actually question. Maybe say "The following methods are going to be special cased by linker:" in front of the list instead.

are going to be special cased so that if the type and name is exactly known at linking time, only the specific member will be preserved. If the name is not known, all matching members are going to be preserved instead. Linker may look at other parameters to these methods, such as the binding flags and parameter counts to further restrict the set of members preserved.

The special casing will also help in situations such as when the type is not statically known and we only have an annotated value - e.g. calling `GetMethod(...BindingFlags.Public)` on a `System.Type` instance annotated as `DynamicallyAccessedMemberTypes.PublicMethods` should be considered valid.

## Linker unfriendly annotations

Another annotation will be used to mark methods that are never linker friendly:

```csharp
public sealed class LinkerUnfriendlyAttribute : Attribute
{
public LinkerUnfriendlyAttribute(string message)
{
Message = message;
}

public string Message { get; }
}
```

All calls to methods annotated with LinkerUnfriendly will result in a warning and the linker will not analyze linker-friendliness within the method body or the parts of the static callgraph that are only reachable through this method.

TODO: Do we care about localization issues connected with the message string? Do we need an enum with possible messages ("this is never friendly", "use different overload", "use different method", etc.) This is probably the same bucket as `ObsoleteAttribute`.

## Escape hatch: Warning supression

We will provide a way to supress reflection flow related warnings within linker. This is meant to be used in cases where we know that a pattern is safe, but the linker is not smart enough to reason about it.

A good example of such pattern could be caches and maps using System.Type. If the cache only stores values that have a certain annotation, all values read from the cache should be annotated the same way, but linker won't be able to see this though.

```csharp
private Dictionary<Type, Type> _interfaceToImpl;

public void RegisterInterface(Type intface, [DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.DefaultConstructor)]Type impl)
{
_interfaceToImpl.Add(intface, impl);
}

// Microsoft.ILLinker and IL000:UnsafeReflectionUse are made up values. Real values TBD.
[UnconditionalSuppressMessage("Microsoft.ILLinker", "IL000:UnsafeReflectionUse", MessageId = "CreateInstance")]
public object Activate(Type intface)
{
// This would normally warn because value retrieved from the dictionary is not annotated, but since
// all values placed into the dictionary were annotated, this is safe.
return Activator.CreateInstance(_interfaceToImpl[intface]);
}

```

More details discussed in https://github.com/dotnet/runtime/issues/35339.

## Escape hatch: DynamicDependencyAttribute annotation

This is an existing custom attribute (known as `PreserveDependencyAttribute`) understood by the linker. This attribute allows the user to declare the type name, method/field name, and signature (all as a string) of a method or field that the method dynamically depends on.

When the linker sees a method/constructor/field annotated with this attribute as necessary, it also marks the referenced member as necessary. It also suppresses all analysis within the method.
See issue https://github.com/dotnet/runtime/issues/30902 for details.

## Case study: Custom attributes

Custom attributes are going to be analyzed same way as method calls – the act of applying a custom attribute is a method call to the attribute constructor. The act of setting a property is a call to the property setter.

Linker currently special cases the `TypeConverterAttribute` to make sure we always keep the default constructor of the type referenced by the attribute. With the proposed annotations, it's possible to express what's necessary without the special casing. It will also make it possible for the consuming code to safely pass the value of `TypeConverterAttribute.ConverterTypeName` to `Type.GetType`/`Activator.CreateInstance`, since the location is annotated as known to keep the type and the default constructor.

```csharp
public sealed class TypeConverterAttribute : Attribute
{
public TypeConverterAttribute()
{
ConverterTypeName = string.Empty;
}

public TypeConverterAttribute([DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.DefaultConstructor)] Type type)
{
if (type == null)
{
throw new ArgumentNullException(nameof(type));
}

// SAFE: assigning AssemblyQualifiedName of a type that is known to have default ctor available
ConverterTypeName = type.AssemblyQualifiedName!;
}

public TypeConverterAttribute([DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.DefaultConstructor)] string typeName)
{
if (typeName == null)
{
throw new ArgumentNullException(nameof(typeName));
}

// SAFE: assigning a known type name string
ConverterTypeName = typeName;
}

[DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.DefaultConstructor)]
public string ConverterTypeName { get; }
}
```