Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add IL-level inlining for async methods #15491

Open
stephentoub opened this issue Nov 23, 2016 · 12 comments
Open

Proposal: Add IL-level inlining for async methods #15491

stephentoub opened this issue Nov 23, 2016 · 12 comments

Comments

@stephentoub
Copy link
Member

When an async method yields for the first time, we get several allocations:

  • The state machine boxed to the heap
  • An Action to be used for continuations
  • A “MoveNextRunner” in the infrastructure that’s allocated to help store additional state
  • The Task (or whatever the return type)

For performance-sensitive code bases, these costs can lead developers to avoid refactoring code into smaller async methods in order to avoid introducing additional allocations, or more commonly in my own experience to manually combine async methods as part of perf investigations, often making the code harder to maintain, often leading to duplication, etc. If async method A awaits B awaits C awaits D, and D ends up yielding, then each of A, B, C, and D incur all of the allocations mentioned above; if instead the developer puts all of the code into A, those allocations are only incurred for one frame rather than for each of four.

I propose we add the ability for the language compiler to inline an async method into another; this could be automatic based on some heuristic, it could be manual based on a new [ILInline] attribute or something like that, or some combination (I prefer the attribute for now). Then a developer can have their cake and eat it, too: they can refactor their code into smaller async methods that compose well and make the code more readable/maintainable, but they don’t have to pay the performance penalty for doing so.

This is also an optimization that would be more than the sum of its parts. It’s not just getting rid of Action/MoveNextRunner/Task allocations, but it also affords the opportunity to shrink the combined state machines:

  • The state machine type contains several fields that would not need to be duplicated, e.g. the builder, the integer state, etc.: when inlined and the state machines are combined, those fields would be entirely duplicative and could be removed.
  • There are fields often captured as part of the state machine that would also be duplicative, e.g. for an instance async method the implicit this reference.
  • The compiler already reuses fields for some state, e.g. multiple awaits with the same awaiter type end up reusing the same awaiter fields, so such fields could be collapsed across the inliner and inlinee as well.
  • Most of the time the arguments passed from the inliner to the inlinee are already fields on the inliner’s state machine and yet are duplicated in the inlinee’s, so that could be avoided (e.g. the CancellationToken).
  • Plus once Async state machine fields should be reused where possible for user-defined locals/parameters #15290 is implemented/fixed, since the locals in the inlinee are logically scoped, it becomes likely that locals/fields from the inlinee can be shared with those from the inliner.

All of that can result in significant savings. There are also other costs it would help avoid:

  • IL would be smaller, as a result of the collapsed MoveNext methods, fewer fields, not needing separate state machine types, not needing separate entry methods, etc.
  • We’d avoid the (small) overhead involved in calling the inlined async method.
  • Etc.

There are some potential corner-case side effects, which among other reasons leads me to want this to be opt-in with an attribute. For example, given this code:

static AsyncLocal<int> s_local = new AsyncLocal<int>();

static async Task A()
{
    s_local.Value = 1;
    await B();
    Console.Write(s_local.Value);
}

static async Task B()
{
    s_local.Value = 2;
    await Task.Delay(1);
    Console.Write(s_local.Value);
}

Today this prints out 21. If the code from B were to just be moved into A, it would instead print out 22. This could be addressed as part of the inlining with additional runtime support, but I’d be happy with it just being a side-effect of opt-in inlining that a developer has to be aware of to use the advanced feature.

@alrz
Copy link
Member

alrz commented Nov 23, 2016

Related: #10449

@vbcodec
Copy link

vbcodec commented Nov 23, 2016

Maybe better, faster and easier is to recycle these frames. Just create some smart cache for frames and implement methods for cleanup before subsequent usage.

@bbarry
Copy link

bbarry commented Nov 23, 2016

@vbcodec unfortunately those frames are not the same types or sizes...

I would like a general purpose (beyond the scope of async methods) [ILInline] attribute (not sure about the name though) as a build time enforcement of the CLR hint attribute.

Specifically for async machines though, I do question the validity of this "pro" you have stated:

  • IL would be smaller, as a result of the collapsed MoveNext methods, fewer fields, not needing separate state machine types, not needing separate entry methods, etc.
static async Task A()
{
    var a = 0;
    await B();
    Console.Write(a++);
    await B();
    Console.Write(a++);
    await B();
    Console.Write(a++);
    await B();
    Console.Write(a);
}

[ILInline]
static async Task B()
{
    var b = 2;
    await Task.Delay(1);
    Console.Write(b);
}

I'd expect this to compile effectively the same as if it was written:

static async Task A()
{
    var a = 0;
    {
        var b = 2;
        await Task.Delay(1);
        Console.Write(b);
    }
    Console.Write(a++);
    {
        var b = 2;
        await Task.Delay(1);
        Console.Write(b);
    }
    Console.Write(a++);
    {
        var b = 2;
        await Task.Delay(1);
        Console.Write(b);
    }
    Console.Write(a++);
    {
        var b = 2;
        await Task.Delay(1);
        Console.Write(b);
    }
    Console.Write(a);
}

[ILInline]
static async Task B()
{
    var b = 2;
    await Task.Delay(1);
    Console.Write(b);
}

that is:

  • the SM for B would still exist as a compilation artifact (the method B could for example be executed via reflection still)
  • the inlined machine may be repeated more than once in the resulting code

On the other hand, the code for the MoveNext method of the second state machine could be rewritten with this procedure:

  1. eliminate the TaskAwaiter field (remove statements that assign to or from <>u__1)

  2. raise the taskAwaiter local to a ref TaskAwaiter taskAwaiter parameter

  3. move the "state stack" to local ref parameters (in this case <>1__state)

  4. save this method in the state machine where it is being inlined and copy fields and inlined methods over and fix the type on the this.<>t__builder.AwaitUnsafeOnCompleted call

  5. in the calling machine, add an additional "state stack" field: <>2__state

  6. convert the call site for the inlined method from:

     taskAwaiter = method().GetAwaiter();
    

    to:

     this.<>2__state = -1;
     method(ref <>2__state, ref taskAwaiter);
    
  7. at the continuation site, call back into the method to internally continue, from:

     taskAwaiter = this.<>u__1;
     this.<>u__1 = default(TaskAwaiter);
     this.<>1__state = -1;
    

    to:

     taskAwaiter = this.<>u__1;
     method(ref <>2__state, ref taskAwaiter);
     this.<>u__1 = default(TaskAwaiter);
     this.<>1__state = -1;
    

full gist of such a machine: https://gist.github.com/bbarry/01690992031f27de542997a4682db391

@bbarry
Copy link

bbarry commented Nov 23, 2016

updated gist with an alternative implementation: https://gist.github.com/bbarry/01690992031f27de542997a4682db391

Alternatively, the [ILInline] attribute could modify the produced state machine for B as follows:

  1. make the MoveNext method public static and call it from a generated void IAsyncStateMachine.MoveNext()
  2. convert all fields in the method into ref parameters
  3. add a generic type parameter to represent this

When another state machine calls a method with this attribute, the callsite is modified by inlining and adding the necessary local fields:

from

 taskAwaiter = method().GetAwaiter();

to

 this.<>2__state = -1;
 <method>d__1.MoveNext<ThisType>(ref this.<>t__builder, ref this.<>2__state, .../* lifted fields */,  ref taskAwaiter, ref this);

and adding that last line to the continuation site in a similar manner

@svick
Copy link
Contributor

svick commented Nov 23, 2016

  1. Do I understand it correctly that this would affect only methods that are immediately awaited?
  2. Would the attribute be placed on the inliner (A in the example) or the inlinee (B)? If it was the inlinee, then I'm not sure that gives the user enough control, since it would mean that a framework method would either always be inlined or never be. (Also, I think I would prefer a name like [AsyncInline].)

@stephentoub
Copy link
Member Author

Maybe better, faster and easier is to recycle these frames. Just create some smart cache for frames and implement methods for cleanup before subsequent usage.

I don't believe that would be better/faster/easier. Pooling has its own costs and implications, and I would much rather eliminate costs entirely rather than try to workaround them via pooling. It's possible some form of pooling could be explored in addition, but I don't believe it replaces the desire for something along the lines of what I've suggested.

Do I understand it correctly that this would affect only methods that are immediately awaited?

That was my intention.

Would the attribute be placed on the inliner (A in the example) or the inlinee (B)? If it was the inlinee, then I'm not sure that gives the user enough control

My suggestion here (which was a rough outline, I'm sure with details that would change) was to place it on the inlinee. I also did not envision this being placed on publicly exposed methods, but rather for use within my own code, where within the same compilation unit the compiler could apply such optimizations. I would expect that the compiler would ignore such an attribute when consuming a method from one assembly into another.

@jamesqo
Copy link
Contributor

jamesqo commented Nov 25, 2016

Interesting. Would this be applicable to iterator state machines as well? Seems like many of the benefits mentioned here (combining implicit this / state fields, removing additional allocations) would be beneficial there too.

@svick
Copy link
Contributor

svick commented Nov 25, 2016

@jamesqo I'm not sure iterators have the same performance issues as async. I think long chains of nested iterators do (or could) happen in two cases:

  1. LINQ. It would be awesome if chains of LINQ operators could be collapsed. But this feature is working with your C# code and reference assemblies for standard libraries, so that wouldn't be possible.
  2. Recursion. Inlining is not really possible there (with or without yield foreach Feature Request: Recursive Iterators (non-quadratic) #15).

Or are there some cases that I'm missing that you think would be common enough to make that worth it?

@jamesqo
Copy link
Contributor

jamesqo commented Nov 25, 2016

LINQ. It would be awesome if chains of LINQ operators could be collapsed. But this feature is working with your C# code and reference assemblies for standard libraries, so that wouldn't be possible.

Good point. I had Linq in mind when I wrote that, but I had forgotten that doing any sort of inlining optimization would be invalid across library boundaries, since the code in the other library could change and the already-compiled library that contains inlined code would be stale. Besides, a lot of Linq is being converted to use custom iterator types which the compiler won't recognize. 👍

Still, it seems like it would be a nice optimization to have if it is not too much trouble after the work for async is done.

@RamType0
Copy link

This has the potential to significantly improve the performance of canceling nested tasks.
We don't need nested rethrowing and capturing of OperationCanceledException anymore with this.

@CyrusNajmabadi
Copy link
Member

@stephentoub any thoughts on this now? Do you see a future for this? Or is there a different path we'd prefer today to get the benefits?

@stephentoub
Copy link
Member Author

Or is there a different path we'd prefer today to get the benefits?

I don't know of any other path for this, other than continuing along the current path of copy/paste the code into each method that needs it.

@arunchndr arunchndr added this to the Backlog milestone Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants