Proposal: Add IL-level inlining for async methods #15491

stephentoub · 2016-11-23T12:08:14Z

When an async method yields for the first time, we get several allocations:

The state machine boxed to the heap
An Action to be used for continuations
A “MoveNextRunner” in the infrastructure that’s allocated to help store additional state
The Task (or whatever the return type)

For performance-sensitive code bases, these costs can lead developers to avoid refactoring code into smaller async methods in order to avoid introducing additional allocations, or more commonly in my own experience to manually combine async methods as part of perf investigations, often making the code harder to maintain, often leading to duplication, etc. If async method A awaits B awaits C awaits D, and D ends up yielding, then each of A, B, C, and D incur all of the allocations mentioned above; if instead the developer puts all of the code into A, those allocations are only incurred for one frame rather than for each of four.

I propose we add the ability for the language compiler to inline an async method into another; this could be automatic based on some heuristic, it could be manual based on a new [ILInline] attribute or something like that, or some combination (I prefer the attribute for now). Then a developer can have their cake and eat it, too: they can refactor their code into smaller async methods that compose well and make the code more readable/maintainable, but they don’t have to pay the performance penalty for doing so.

This is also an optimization that would be more than the sum of its parts. It’s not just getting rid of Action/MoveNextRunner/Task allocations, but it also affords the opportunity to shrink the combined state machines:

The state machine type contains several fields that would not need to be duplicated, e.g. the builder, the integer state, etc.: when inlined and the state machines are combined, those fields would be entirely duplicative and could be removed.
There are fields often captured as part of the state machine that would also be duplicative, e.g. for an instance async method the implicit this reference.
The compiler already reuses fields for some state, e.g. multiple awaits with the same awaiter type end up reusing the same awaiter fields, so such fields could be collapsed across the inliner and inlinee as well.
Most of the time the arguments passed from the inliner to the inlinee are already fields on the inliner’s state machine and yet are duplicated in the inlinee’s, so that could be avoided (e.g. the CancellationToken).
Plus once Async state machine fields should be reused where possible for user-defined locals/parameters #15290 is implemented/fixed, since the locals in the inlinee are logically scoped, it becomes likely that locals/fields from the inlinee can be shared with those from the inliner.

All of that can result in significant savings. There are also other costs it would help avoid:

IL would be smaller, as a result of the collapsed MoveNext methods, fewer fields, not needing separate state machine types, not needing separate entry methods, etc.
We’d avoid the (small) overhead involved in calling the inlined async method.
Etc.

There are some potential corner-case side effects, which among other reasons leads me to want this to be opt-in with an attribute. For example, given this code:

static AsyncLocal<int> s_local = new AsyncLocal<int>();

static async Task A()
{
    s_local.Value = 1;
    await B();
    Console.Write(s_local.Value);
}

static async Task B()
{
    s_local.Value = 2;
    await Task.Delay(1);
    Console.Write(s_local.Value);
}

Today this prints out 21. If the code from B were to just be moved into A, it would instead print out 22. This could be addressed as part of the inlining with additional runtime support, but I’d be happy with it just being a side-effect of opt-in inlining that a developer has to be aware of to use the advanced feature.

alrz · 2016-11-23T12:39:27Z

Related: #10449

vbcodec · 2016-11-23T14:02:30Z

Maybe better, faster and easier is to recycle these frames. Just create some smart cache for frames and implement methods for cleanup before subsequent usage.

bbarry · 2016-11-23T18:21:36Z

@vbcodec unfortunately those frames are not the same types or sizes...

I would like a general purpose (beyond the scope of async methods) [ILInline] attribute (not sure about the name though) as a build time enforcement of the CLR hint attribute.

Specifically for async machines though, I do question the validity of this "pro" you have stated:

IL would be smaller, as a result of the collapsed MoveNext methods, fewer fields, not needing separate state machine types, not needing separate entry methods, etc.

static async Task A()
{
    var a = 0;
    await B();
    Console.Write(a++);
    await B();
    Console.Write(a++);
    await B();
    Console.Write(a++);
    await B();
    Console.Write(a);
}

[ILInline]
static async Task B()
{
    var b = 2;
    await Task.Delay(1);
    Console.Write(b);
}

I'd expect this to compile effectively the same as if it was written:

static async Task A()
{
    var a = 0;
    {
        var b = 2;
        await Task.Delay(1);
        Console.Write(b);
    }
    Console.Write(a++);
    {
        var b = 2;
        await Task.Delay(1);
        Console.Write(b);
    }
    Console.Write(a++);
    {
        var b = 2;
        await Task.Delay(1);
        Console.Write(b);
    }
    Console.Write(a++);
    {
        var b = 2;
        await Task.Delay(1);
        Console.Write(b);
    }
    Console.Write(a);
}

[ILInline]
static async Task B()
{
    var b = 2;
    await Task.Delay(1);
    Console.Write(b);
}

that is:

the SM for B would still exist as a compilation artifact (the method B could for example be executed via reflection still)
the inlined machine may be repeated more than once in the resulting code

On the other hand, the code for the MoveNext method of the second state machine could be rewritten with this procedure:

eliminate the TaskAwaiter field (remove statements that assign to or from <>u__1)
raise the taskAwaiter local to a ref TaskAwaiter taskAwaiter parameter
move the "state stack" to local ref parameters (in this case <>1__state)
save this method in the state machine where it is being inlined and copy fields and inlined methods over and fix the type on the this.<>t__builder.AwaitUnsafeOnCompleted call
in the calling machine, add an additional "state stack" field: <>2__state

convert the call site for the inlined method from:

 taskAwaiter = method().GetAwaiter();

to:

 this.<>2__state = -1;
 method(ref <>2__state, ref taskAwaiter);

at the continuation site, call back into the method to internally continue, from:

 taskAwaiter = this.<>u__1;
 this.<>u__1 = default(TaskAwaiter);
 this.<>1__state = -1;

to:

 taskAwaiter = this.<>u__1;
 method(ref <>2__state, ref taskAwaiter);
 this.<>u__1 = default(TaskAwaiter);
 this.<>1__state = -1;

full gist of such a machine: https://gist.github.com/bbarry/01690992031f27de542997a4682db391

bbarry · 2016-11-23T19:38:37Z

updated gist with an alternative implementation: https://gist.github.com/bbarry/01690992031f27de542997a4682db391

Alternatively, the [ILInline] attribute could modify the produced state machine for B as follows:

make the MoveNext method public static and call it from a generated void IAsyncStateMachine.MoveNext()
convert all fields in the method into ref parameters
add a generic type parameter to represent this

When another state machine calls a method with this attribute, the callsite is modified by inlining and adding the necessary local fields:

from

 taskAwaiter = method().GetAwaiter();

to

 this.<>2__state = -1;
 <method>d__1.MoveNext<ThisType>(ref this.<>t__builder, ref this.<>2__state, .../* lifted fields */,  ref taskAwaiter, ref this);

and adding that last line to the continuation site in a similar manner

svick · 2016-11-23T20:16:12Z

Do I understand it correctly that this would affect only methods that are immediately awaited?
Would the attribute be placed on the inliner (A in the example) or the inlinee (B)? If it was the inlinee, then I'm not sure that gives the user enough control, since it would mean that a framework method would either always be inlined or never be. (Also, I think I would prefer a name like [AsyncInline].)

stephentoub · 2016-11-23T20:21:44Z

Maybe better, faster and easier is to recycle these frames. Just create some smart cache for frames and implement methods for cleanup before subsequent usage.

I don't believe that would be better/faster/easier. Pooling has its own costs and implications, and I would much rather eliminate costs entirely rather than try to workaround them via pooling. It's possible some form of pooling could be explored in addition, but I don't believe it replaces the desire for something along the lines of what I've suggested.

Do I understand it correctly that this would affect only methods that are immediately awaited?

That was my intention.

Would the attribute be placed on the inliner (A in the example) or the inlinee (B)? If it was the inlinee, then I'm not sure that gives the user enough control

My suggestion here (which was a rough outline, I'm sure with details that would change) was to place it on the inlinee. I also did not envision this being placed on publicly exposed methods, but rather for use within my own code, where within the same compilation unit the compiler could apply such optimizations. I would expect that the compiler would ignore such an attribute when consuming a method from one assembly into another.

jamesqo · 2016-11-25T21:31:30Z

Interesting. Would this be applicable to iterator state machines as well? Seems like many of the benefits mentioned here (combining implicit this / state fields, removing additional allocations) would be beneficial there too.

svick · 2016-11-25T21:57:05Z

@jamesqo I'm not sure iterators have the same performance issues as async. I think long chains of nested iterators do (or could) happen in two cases:

LINQ. It would be awesome if chains of LINQ operators could be collapsed. But this feature is working with your C# code and reference assemblies for standard libraries, so that wouldn't be possible.
Recursion. Inlining is not really possible there (with or without yield foreach Feature Request: Recursive Iterators (non-quadratic) #15).

Or are there some cases that I'm missing that you think would be common enough to make that worth it?

jamesqo · 2016-11-25T23:07:03Z

LINQ. It would be awesome if chains of LINQ operators could be collapsed. But this feature is working with your C# code and reference assemblies for standard libraries, so that wouldn't be possible.

Good point. I had Linq in mind when I wrote that, but I had forgotten that doing any sort of inlining optimization would be invalid across library boundaries, since the code in the other library could change and the already-compiled library that contains inlined code would be stale. Besides, a lot of Linq is being converted to use custom iterator types which the compiler won't recognize. 👍

Still, it seems like it would be a nice optimization to have if it is not too much trouble after the work for async is done.

RamType0 · 2020-11-18T14:31:58Z

This has the potential to significantly improve the performance of canceling nested tasks.
We don't need nested rethrowing and capturing of OperationCanceledException anymore with this.

CyrusNajmabadi · 2022-11-08T17:11:58Z

@stephentoub any thoughts on this now? Do you see a future for this? Or is there a different path we'd prefer today to get the benefits?

stephentoub · 2022-12-09T21:01:24Z

Or is there a different path we'd prefer today to get the benefits?

I don't know of any other path for this, other than continuing along the current path of copy/paste the code into each method that needs it.

Pilchie added the Area-Language Design label Nov 23, 2016

bbarry mentioned this issue Dec 31, 2016

Discussion: Code Generator Catalog #16160

Closed

stephentoub mentioned this issue Feb 11, 2018

Enable SocketsHttpHandler cancellation support dotnet/corefx#27029

Merged

alrz mentioned this issue May 3, 2020

Proposal: Inline async method #43908

Closed

arunchndr added this to the Backlog milestone Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Add IL-level inlining for async methods #15491

Proposal: Add IL-level inlining for async methods #15491

stephentoub commented Nov 23, 2016

alrz commented Nov 23, 2016

vbcodec commented Nov 23, 2016

bbarry commented Nov 23, 2016 •

edited

Loading

bbarry commented Nov 23, 2016

svick commented Nov 23, 2016

stephentoub commented Nov 23, 2016

jamesqo commented Nov 25, 2016

svick commented Nov 25, 2016

jamesqo commented Nov 25, 2016

RamType0 commented Nov 18, 2020

CyrusNajmabadi commented Nov 8, 2022

stephentoub commented Dec 9, 2022

Proposal: Add IL-level inlining for async methods #15491

Proposal: Add IL-level inlining for async methods #15491

Comments

stephentoub commented Nov 23, 2016

alrz commented Nov 23, 2016

vbcodec commented Nov 23, 2016

bbarry commented Nov 23, 2016 • edited Loading

bbarry commented Nov 23, 2016

svick commented Nov 23, 2016

stephentoub commented Nov 23, 2016

jamesqo commented Nov 25, 2016

svick commented Nov 25, 2016

jamesqo commented Nov 25, 2016

RamType0 commented Nov 18, 2020

CyrusNajmabadi commented Nov 8, 2022

stephentoub commented Dec 9, 2022

bbarry commented Nov 23, 2016 •

edited

Loading