Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP queue benchmarks #6127

Merged

Conversation

Aaronontheweb
Copy link
Member

@Aaronontheweb Aaronontheweb commented Oct 3, 2022

Fixes #

Changes

Please provide a brief description of the changes here.

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

Latest dev Benchmarks

Include data from the relevant benchmark prior to this change here.

This PR's Benchmarks

Include data from after this change here.

@Aaronontheweb
Copy link
Member Author

Notes about the benchmarks:

  • All conducted on Gen 1 Ryzen machine
  • Reads are more involved than writes - Mailbox.Run tests involve completely draining a fully enqued mailbox into an actor.

NET Core 3.1

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2006 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.201
  [Host]     : .NET Core 3.1.23 (CoreCLR 4.700.22.11601, CoreFX 4.700.22.12208), X64 RyuJIT
  Job-GCFXAQ : .NET Core 3.1.23 (CoreCLR 4.700.22.11601, CoreFX 4.700.22.12208), X64 RyuJIT

InvocationCount=1  UnrollFactor=1  
Method MsgCount UseCallingThreadDispatcher Mean Error StdDev Median Allocated
EnqueuePerformance 10000 False 283.6 μs 21.68 μs 63.57 μs 308.3 μs 384 KB
RunPerformance 10000 False 2,481.4 μs 408.72 μs 1,205.13 μs 1,911.2 μs 30 KB
EnqueuePerformance 10000 True 310.4 μs 6.08 μs 10.32 μs 309.3 μs 384 KB
RunPerformance 10000 True 2,814.8 μs 27.05 μs 23.98 μs 2,812.3 μs 20 KB
EnqueuePerformance 100000 False 1,679.5 μs 27.03 μs 45.91 μs 1,666.8 μs 3,073 KB
RunPerformance 100000 False 13,274.6 μs 421.91 μs 1,244.00 μs 13,249.0 μs 303 KB
EnqueuePerformance 100000 True 1,684.3 μs 33.47 μs 60.35 μs 1,667.9 μs 3,073 KB
RunPerformance 100000 True 9,378.1 μs 162.36 μs 151.87 μs 9,396.4 μs 202 KB

@Aaronontheweb
Copy link
Member Author

.NET 6

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2006 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.201
  [Host]     : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
  Job-LYWECL : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT

InvocationCount=1  UnrollFactor=1  
Method MsgCount UseCallingThreadDispatcher Mean Error StdDev Median Allocated
EnqueuePerformance 10000 False 199.1 μs 11.55 μs 33.15 μs 210.8 μs 385 KB
RunPerformance 10000 False 1,386.7 μs 105.10 μs 278.71 μs 1,311.8 μs 31 KB
EnqueuePerformance 10000 True 215.9 μs 4.13 μs 4.24 μs 215.9 μs 385 KB
RunPerformance 10000 True 1,635.2 μs 210.63 μs 621.05 μs 1,177.0 μs 21 KB
EnqueuePerformance 100000 False 2,204.1 μs 41.64 μs 47.95 μs 2,198.2 μs 3,074 KB
RunPerformance 100000 False 11,459.3 μs 362.18 μs 1,062.22 μs 11,318.6 μs 304 KB
EnqueuePerformance 100000 True 2,170.6 μs 21.06 μs 18.67 μs 2,168.3 μs 3,074 KB
RunPerformance 100000 True 9,481.6 μs 152.60 μs 127.43 μs 9,442.9 μs 203 KB

@Aaronontheweb
Copy link
Member Author

Performance is higher across the board for .NET 6, which is what I'd expect, but not what's being reported....

@Aaronontheweb
Copy link
Member Author

Dropped the CallingThreadDispatcher and ran some larger sample sizes

NET Core 3.1

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2006 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.201
  [Host]     : .NET Core 3.1.23 (CoreCLR 4.700.22.11601, CoreFX 4.700.22.12208), X64 RyuJIT
  Job-OQAWPR : .NET Core 3.1.23 (CoreCLR 4.700.22.11601, CoreFX 4.700.22.12208), X64 RyuJIT

InvocationCount=1  UnrollFactor=1  
Method MsgCount Mean Error StdDev Median Gen 0 Allocated
EnqueuePerformance 10000 266.4 μs 23.74 μs 70.00 μs 302.4 μs - 384 KB
RunPerformance 10000 2,523.6 μs 412.76 μs 1,217.04 μs 1,938.1 μs - 30 KB
EnqueuePerformance 100000 1,688.9 μs 29.03 μs 57.30 μs 1,669.5 μs - 3,073 KB
RunPerformance 100000 12,316.3 μs 361.19 μs 1,053.61 μs 12,313.4 μs - 303 KB
EnqueuePerformance 1000000 17,332.3 μs 165.68 μs 146.87 μs 17,275.8 μs - 24,578 KB
RunPerformance 1000000 113,875.7 μs 2,269.29 μs 5,523.77 μs 113,916.6 μs - 3,024 KB
EnqueuePerformance 10000000 183,510.1 μs 1,445.60 μs 1,352.22 μs 183,315.6 μs - 245,764 KB
RunPerformance 10000000 1,158,091.1 μs 19,494.45 μs 18,235.12 μs 1,159,073.9 μs 7000.0000 30,243 KB

NET 6

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2006 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.201
  [Host]     : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
  Job-OIYZGM : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT

InvocationCount=1  UnrollFactor=1  
Method MsgCount Mean Error StdDev Median Gen 0 Allocated
EnqueuePerformance 10000 204.8 μs 11.15 μs 32.88 μs 212.7 μs - 385 KB
RunPerformance 10000 1,401.4 μs 100.58 μs 264.96 μs 1,445.4 μs - 32 KB
EnqueuePerformance 100000 2,212.6 μs 42.56 μs 37.73 μs 2,205.2 μs - 3,074 KB
RunPerformance 100000 10,864.3 μs 410.06 μs 1,209.06 μs 10,438.8 μs - 303 KB
EnqueuePerformance 1000000 16,545.5 μs 314.92 μs 323.40 μs 16,492.6 μs - 24,579 KB
RunPerformance 1000000 108,755.9 μs 2,167.08 μs 2,892.98 μs 108,989.9 μs - 3,026 KB
EnqueuePerformance 10000000 164,145.8 μs 2,003.10 μs 1,775.69 μs 164,316.1 μs - 245,765 KB
RunPerformance 10000000 1,040,662.8 μs 15,129.41 μs 14,152.06 μs 1,038,976.2 μs 7000.0000 30,243 KB

Again, .NET 6 is faster across the board. Whatever latency increase users are seeing, can't reproduce it easily here....

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Oct 3, 2022

One thing that is interesting though - the second iteration of 100,000 is consistently slower on both readings for .NET 6 than it is for .NET Core 3.1 when the default dispatcher (ThreadPool) is used. That's consistent in these numbers.

@Aaronontheweb
Copy link
Member Author

Adding an IStash to the Test Actor

.NET Core 3.1

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2006 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.201
  [Host]     : .NET Core 3.1.23 (CoreCLR 4.700.22.11601, CoreFX 4.700.22.12208), X64 RyuJIT
  Job-FCVKMQ : .NET Core 3.1.23 (CoreCLR 4.700.22.11601, CoreFX 4.700.22.12208), X64 RyuJIT

InvocationCount=1  UnrollFactor=1  
Method MsgCount Mean Error StdDev Median Gen 0 Allocated
EnqueuePerformance 10000 275.0 μs 23.66 μs 69.39 μs 311.2 μs - 384 KB
RunPerformance 10000 2,380.3 μs 449.23 μs 1,324.58 μs 1,521.8 μs - 30 KB
EnqueuePerformance 100000 1,732.5 μs 32.15 μs 77.64 μs 1,717.3 μs - 3,073 KB
RunPerformance 100000 11,895.4 μs 391.42 μs 1,147.98 μs 11,964.4 μs - 303 KB
EnqueuePerformance 1000000 17,848.0 μs 282.91 μs 264.63 μs 17,842.8 μs - 24,578 KB
RunPerformance 1000000 122,506.7 μs 2,427.81 μs 4,560.01 μs 123,052.4 μs - 3,024 KB
EnqueuePerformance 10000000 184,835.0 μs 1,333.19 μs 1,113.28 μs 184,686.7 μs - 245,764 KB
RunPerformance 10000000 1,193,947.7 μs 22,128.39 μs 20,698.91 μs 1,193,306.7 μs 7000.0000 30,243 KB

.NET 6

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2006 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.201
  [Host]     : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
  Job-ICYPTK : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT

InvocationCount=1  UnrollFactor=1  
Method MsgCount Mean Error StdDev Median Gen 0 Allocated
EnqueuePerformance 10000 237.9 μs 5.13 μs 15.13 μs 235.3 μs - 385 KB
RunPerformance 10000 2,671.1 μs 330.58 μs 974.74 μs 3,163.6 μs - 31 KB
EnqueuePerformance 100000 2,199.2 μs 39.01 μs 34.58 μs 2,199.2 μs - 3,074 KB
RunPerformance 100000 11,007.1 μs 314.61 μs 922.70 μs 10,923.7 μs - 304 KB
EnqueuePerformance 1000000 16,574.7 μs 319.58 μs 426.63 μs 16,468.3 μs - 24,578 KB
RunPerformance 1000000 120,354.4 μs 2,369.00 μs 2,728.15 μs 121,167.5 μs - 3,028 KB
EnqueuePerformance 10000000 166,294.2 μs 2,433.99 μs 2,276.75 μs 166,628.0 μs - 245,765 KB
RunPerformance 10000000 1,082,541.1 μs 14,659.78 μs 12,995.51 μs 1,081,883.4 μs 7000.0000 30,243 KB

@Aaronontheweb
Copy link
Member Author

Looks like the .NET 6 numbers suffered a lot more with an IStash, which changes the actor's mailbox to a double-ended queue...

@iress-ljm
Copy link

@Aaronontheweb If IStash is contributing performance issues in .NET 6, would this affect ReceivePersistentActors? UntypedPersistentActor inherits from Eventsourced, which implements a stash

@Aaronontheweb
Copy link
Member Author

@iress-ljm indeed it would - the numbers are slightly, but consistently worse with stashing. I'm still leaning towards this being a dispatcher issue though, rather than a data structure problem.

@Aaronontheweb Aaronontheweb marked this pull request as ready for review October 4, 2022 20:43
@Aaronontheweb Aaronontheweb merged commit 7441faa into akkadotnet:v1.4 Oct 5, 2022
@Aaronontheweb Aaronontheweb deleted the dotnet6-queuing-benchmarks branch October 5, 2022 01:21
@iress-ljm
Copy link

Thanks @Aaronontheweb, have you opened a new PR to track the dispatcher testing? It'd be good for the team to be able to track this issue

@Aaronontheweb
Copy link
Member Author

We have some old NBench benchmarks for tracking dispatcher overhead: https://github.com/akkadotnet/akka.net/tree/dev/src/core/Akka.Tests.Performance/Dispatch - we should port some of those to Benchmark.NET. I'll see about doing that today - I'm doing some performance work around improving speeds in this area already today (see #6134)

@Aaronontheweb
Copy link
Member Author

@iress-ljm Dispatcher benchmarks have been added here: #6140

So far it looks like .NET 6 performance is better than .NET Core 3.1, but again - I can demonstrate the .NET6-specific drop with RemotePingPong reliably still.

Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this pull request Oct 8, 2022
* WIP queue benchmarks

* completed MailboxThroughputBenchmarks

* disable `CallingThreadDispatcher`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants