-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using ConfigureAwait() 400% slower on completed task #26610
Comments
Why do you say that? ConfigureAwait returns a |
I rephrased it slightly just now - according to BenchmarkDotNet, the versions with It is strange that the difference is 12 bytes, which is the minimum object size on 32-bit, but I'm running 64-bit (which has a minimum object size of 24 bytes.) |
First, this is the corefx repo, so measuring .NET 4.7.1 isn't the right thing to measure. I suspect something is awry in your measurements on .NET Framework, as there shouldn't be such a difference in the allocation size when nothing is yielding. When I run this with .NET Core 2.1, I get results like this:
so no difference in allocation, and around 3.5x when using ConfigureAwait on an already completed task, but we're here talking about nanoseconds, and the difference between 3ns and 10ns... it's not hard to be 3x slower when we're measuring things at the level of a few instructions. Yes, ConfigureAwait adds a few more instructions. Calling it requires creating another struct on the stack and initializing it, and then using it requires passing the values of those fields into the same helper used without ConfigureAwait, but without ConfigureAwait, a const value is passed in rather than a field. So there's a bit more work to be done. I'm not sure what improvements you hope to see here, but if you have concrete ideas, you're welcome to submit a PR.
I do not understand the suggestion. This is not how |
Aside you'd only want to use e.g. change if (!_completedTask.IsCompleted)
await _completedTask.ConfigureAwait(false);
x += _completedTask.ConfigureAwait(false).GetAwaiter().GetResult(); to if (!_completedTask.IsCompleted)
await _completedTask.ConfigureAwait(false);
x += _completedTask.GetAwaiter().GetResult(); Which changes the results to Method | Mean | Error | Scaled | Allocated |
------------------------------- |----------:|----------:|-------:|----------:|
ConfigureAwaitFalse_GetAwaiter | 2.662 ns | 0.0094 ns | 0.99 | 72 B |
ConfigureAwaitTrue_GetAwaiter | 2.644 ns | 0.0149 ns | 0.99 | 72 B |
Await_ConfigureAwaitFalse | 11.123 ns | 0.0197 ns | 4.14 | 72 B |
Await_ConfigureAwaitTrue | 11.138 ns | 0.0306 ns | 4.15 | 72 B |
Await | 3.134 ns | 0.0118 ns | 1.17 | 72 B |
Result | 2.685 ns | 0.0506 ns | 1.00 | 72 B |
GetAwaiter | 2.744 ns | 0.0248 ns | 1.02 | 72 B |
Also if your methods generally return sync (completed path) rather than going async; changing the return type to ValueTask version Method | Mean | Error | Scaled | Allocated |
------------------------------- |----------:|----------:|-------:|----------:|
ConfigureAwaitFalse_GetAwaiter | 2.649 ns | 0.0092 ns | 1.00 | 0 B |
ConfigureAwaitTrue_GetAwaiter | 2.644 ns | 0.0114 ns | 1.00 | 0 B |
Await_ConfigureAwaitFalse | 11.695 ns | 0.0138 ns | 4.43 | 0 B |
Await_ConfigureAwaitTrue | 11.139 ns | 0.0451 ns | 4.22 | 0 B |
Await | 3.099 ns | 0.0076 ns | 1.17 | 0 B |
Result | 2.642 ns | 0.0049 ns | 1.00 | 0 B |
GetAwaiter | 2.639 ns | 0.0022 ns | 1.00 | 0 B | |
@benaadams Yes, I included In these performance sensitive areas I'm using cached tasks where possible, otherwise |
@stephentoub OK, that makes sense regarding allocations, thanks. As @benaadams intimated, this Cutting away the BenchmarkDotNet=v0.10.14, OS=Windows 7 SP1 (6.1.7601.0)
Intel Core i7-4770K CPU 3.50GHz (Haswell), 1 CPU, 4 logical and 4 physical cores
Frequency=3417031 Hz, Resolution=292.6517 ns, Timer=TSC
[Host] : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2650.0
DefaultJob : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2650.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Attributes.Jobs;
using BenchmarkDotNet.Running;
using System;
using System.Threading.Tasks;
namespace ConfigureAwait_VSThreading301
{
//[ShortRunJob]
[BenchmarkDotNet.Attributes.DisassemblyDiagnoser]
public class BenchNestedStruct
{
private volatile Task<int> _completedTask = Task.FromResult(1);
const int Operations = 100_000;
public struct FlatReference
{
private Task _task;
private bool _bool;
public FlatReference(Task task, bool b)
{
_task = task;
_bool = b;
}
}
[Benchmark(OperationsPerInvoke = Operations)]
public int New_FlatReference()
{
int x = 0;
for (int i = 0; i < Operations; i++)
{
new FlatReference(_completedTask, false);
}
return x;
}
public FlatReference GetFlatReference(bool b)
{
return new FlatReference(_completedTask, false);
}
[Benchmark(OperationsPerInvoke = Operations, Baseline = true)]
public int Get_FlatReference()
{
int x = 0;
for (int i = 0; i < Operations; i++)
{
GetFlatReference(false);
}
return x;
}
// **************************************
private long _long = 1;
public struct FlatPrimitive
{
private long _long;
private bool _bool;
public FlatPrimitive(long l, bool b)
{
_long = l;
_bool = b;
}
}
public struct NestedPrimitive
{
private FlatPrimitive _FlatPrimitive;
public NestedPrimitive(long l, bool b)
{
_FlatPrimitive = new FlatPrimitive(l, b);
}
}
[Benchmark(OperationsPerInvoke = Operations)]
public int New_NestedPrimitive()
{
int x = 0;
for (int i = 0; i < Operations; i++)
{
new NestedPrimitive(_long, false);
}
return x;
}
public NestedPrimitive GetNestedPrimitive(bool b)
{
return new NestedPrimitive(_long, false);
}
[Benchmark(OperationsPerInvoke = Operations)]
public int Get_NestedPrimitive()
{
int x = 0;
for (int i = 0; i < Operations; i++)
{
GetNestedPrimitive(false);
}
return x;
}
// **************************************
public struct NestedReference
{
private FlatReference _flatReference;
public NestedReference(Task task, bool b)
{
_flatReference = new FlatReference(task, b);
}
}
[Benchmark(OperationsPerInvoke = Operations)]
public int New_NestedReference()
{
int x = 0;
for (int i = 0; i < Operations; i++)
{
new NestedReference(_completedTask, false);
}
return x;
}
public NestedReference GetNestedReference(bool b)
{
return new NestedReference(_completedTask, false);
}
[Benchmark(OperationsPerInvoke = Operations)]
public int Get_NestedReference()
{
int x = 0;
for (int i = 0; i < Operations; i++)
{
GetNestedReference(false);
}
return x;
}
// **************************************
[Benchmark(OperationsPerInvoke = Operations)]
public int Call_ConfigureAwait()
{
int x = 0;
for (int i = 0; i < Operations; i++)
{
_completedTask.ConfigureAwait(false);
}
return x;
}
}
public class Program
{
public static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<BenchNestedStruct>();
}
}
} |
Right. Thanks. So, I'm going to close this issue. To my knowledge there's nothing that's specific to ConfigureAwait here. If I'm wrong, please feel free to re-open and clarify. |
Issue
Calling
ConfigureAwait()
on a task takes a significant amount of time(over 400% slower than the alternatives), and also performs an allocation,
even if the task is already completed. For scenarios with high volume and with
the task usually completed, this necessitates verbose workarounds that checks
the task before calling
ConfigureAwait()
, e.g.:GetAwaiter()
is documented 'for compiler use', which is also a downside.We can use
Task<T>.Result
:But this wraps exceptions in an
AggregateException
, which thendiffers from the
await
path, which is a further downside.Expected Behavior
ConfigureAwait()
should immediately check if the task is completed,and if so immediately return (and without any allocation.) This path should
be at least as fast as calling
Task.IsCompleted()
.This would allow replacing the four lines from the workarounds with an
equally fast one-liner:
Repro
The tests measure the time it takes to get the result from a successfully
completed task, using different methods. With
Task<T>.Result
as thebaseline, using
ConfigureAwait()
takes over 400% longer,irrespective of its parameter value, and whether
await
-ing or not.BenchmarkDotNet
(StdErr columns removed)
The text was updated successfully, but these errors were encountered: