Regression in connections creation performance #61233

sebastienros · 2021-11-04T23:56:43Z

I detected a regression due to this PR #53340 on "connection close" scenarios where we measure the speed of creating connections. It was discovered while investigating ssl handshakes performance comparisons with HttpSys. We discovered the difference was not due to the handshake itself.

On a 12 v-cores windows machines (aspnet-perf-win) it reduces RPS from 14K to 5K and latency goes from 2ms to 6ms. The load generator sends a Connection: close header to force new connections on every request. Can also be measured on Linux.

I can gather traces for the runs before and after the change. Let me know if you want to see ThreadTime/NetworkTCPIP in these since it impacts the traces size and perf differences. These benchmarks are not CPU bound and show a lot of BLOCKED time. The NetworkTCPIP stack might be unnecessary at this level.

@karelz @halter73 @geoffkizer

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2021-11-04T23:56:46Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2021-11-05T01:41:29Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

I detected a regression due to this PR #53340 on "connection close" scenarios where we measure the speed of creating connections. It was discovered while investigating ssl handshakes performance comparisons with HttpSys. We discovered the difference was not due to the handshake itself.

On a 12 v-cores windows machines (aspnet-perf-win) it reduces RPS from 14K to 5K and latency goes from 2ms to 6ms. The load generator sends a Connection: close header to force new connections on every request. Can also be measured on Linux.

I can gather traces for the runs before and after the change. Let me know if you want to see ThreadTime/NetworkTCPIP in these since it impacts the traces size and perf differences. These benchmarks are not CPU bound and show a lot of BLOCKED time. The NetworkTCPIP stack might be unnecessary at this level.

@karelz @halter73 @geoffkizer

Author:	sebastienros
Assignees:	-
Labels:	`area-System.Net.Sockets`, `tenet-performance`, `untriaged`
Milestone:	-

geoffkizer · 2021-11-05T04:42:46Z

Wow, that's a big regression. The linked PR did shuffle around some of the internal handling here, so perhaps that introduced an issue.

Since we are seeing a lot of BLOCKED time, it sounds like we have some sort of contention issue. Can you tell from the traces where the contention is occurring? I looked at the code, but don't see anything obvious here; there are a bunch of interlocked ops but no explicit blocking that I see.

karelz · 2021-11-05T07:20:59Z

Given that this is non-trivial regression, we should fix it sooner in 7.0 and then consider backporting it to 6.0 based on fix risk and assessed + observed impact on customers.

geoffkizer · 2021-11-05T19:06:00Z

I did notice one issue that could cause increased memory usage. Fix is here: #61258

@sebastienros Can you try that out and see if it helps?

This seems like a small issue so I doubt it would be the root cause here, but it's worth a shot.

sebastienros · 2021-11-05T19:15:21Z

Crank command lines to repro and get traces:

Before (build 6.0.0-preview.6.21279.1+ff72cc9)

crank --config https://github.com/aspnet/Benchmarks/blob/main/src/BenchmarksApps/HelloWorldMiddleware/benchmarks.yml?raw=true --scenario connectionclose --profile aspnet-perf-win --application.framework net6.0 --chart --variable warmup=3 --variable duration=8 --variable connections=32  --application.aspnetcoreversion 6.0.0-preview.6.21279.2 --application.runtimeversion 6.0.0-preview.6.21279.1 --application.sdkVersion 6.0.100-preview.6.21276.12 --application.options.collectCounters true --application.collect true

After (build 6.0.0-preview.6.21279.3+b5c91e4)

crank --config https://github.com/aspnet/Benchmarks/blob/main/src/BenchmarksApps/HelloWorldMiddleware/benchmarks.yml?raw=true --scenario connectionclose --profile aspnet-perf-win --application.framework net6.0 --chart --variable warmup=3 --variable duration=8 --variable connections=32  --application.aspnetcoreversion 6.0.0-preview.6.21279.2 --application.runtimeversion 6.0.0-preview.6.21279.3 --application.sdkVersion 6.0.100-preview.6.21276.12 --application.options.collectCounters true --application.collect true

To remove ThreadTime from traces as it makes them much bigger, use this argument and remove +ThreadTime

--application.collectArguments "BufferSizeMB=1024;CircularMB=4096;TplEvents=None;Providers=Microsoft-Diagnostics-DiagnosticSource:0:0;KernelEvents=default+ThreadTime-NetworkTCPIP"

To test local changes add this argument:

--application.options.outputFiles "c:\build\socketsdll-release\*.*"

Note that collecting a trace impacts the RPS significantly, so to compare local changes impact I recommend to disable it.

geoffkizer · 2021-11-06T21:57:16Z

I did a run with #61258 and it does not seem to be the root cause here. Further investigation is needed.

geoffkizer · 2021-11-07T21:30:28Z

I did some runs on Linux (aspnet-perf-lin) and I see bimodal behavior there:

Before		After
req/s	CPU	req/s	CPU
12844	96	17217	92
6715	51	6659	49
14583	93	20078	91
15310	93	6616	50
6601	50	15051	93

This is weird and should be investigated further, but it's not a regression.

Windows results seem to be consistent, with Before around 17k and After around 6-7k. However, these results are similar enough to the two modes above that I wonder if something inadvertantly triggered the Windows behavior to switch from being consistently in the good mode, to consistently in the bad mode.

More investigation is needed here.

karelz · 2021-11-09T20:27:07Z

Triage: The PR shouldn't have caused this type of regression.
Linux was bimodal even before the PR, Windows after the PR is always "bad", This is weird, might be test issue, ASP.NET issue or Runtime issue -- we need to learn more.
Warning: Depends on HW.

Next step: Create micro-benchmark here (e.g. remove ASP.NET from the repro).

CarnaViire · 2021-12-03T17:09:21Z

I am able to reproduce that on pure socket code (on crank); the code is here aspnet/Benchmarks@main...CarnaViire:socket-benchmark

6.0.0-preview.6.21279.1: Mean RPS = 15,479

crank --config https://raw.githubusercontent.com/CarnaViire/Benchmarks/socket-benchmark/src/BenchmarksApps/Sockets/sockets.benchmarks.yml --scenario http --profile aspnet-perf-win --server.aspnetcoreversion 6.0.0-preview.6.21279.2 --server.runtimeversion 6.0.0-preview.6.21279.1 --server.sdkVersion 6.0.100-preview.6.21276.12

6.0.0-preview.6.21279.3: Mean RPS = 8,400

crank --config https://raw.githubusercontent.com/CarnaViire/Benchmarks/socket-benchmark/src/BenchmarksApps/Sockets/sockets.benchmarks.yml --scenario http --profile aspnet-perf-win --server.aspnetcoreversion 6.0.0-preview.6.21279.2 --server.runtimeversion 6.0.0-preview.6.21279.3 --server.sdkVersion 6.0.100-preview.6.21276.12

7.0.0-alpha.1.21567.1 (should already include #61258): Mean RPS = 8,776

crank --config https://raw.githubusercontent.com/CarnaViire/Benchmarks/socket-benchmark/src/BenchmarksApps/Sockets/sockets.benchmarks.yml --scenario http --profile aspnet-perf-win --server.aspnetcoreversion 7.0.0-alpha.1.21567.15 --server.runtimeversion 7.0.0-alpha.1.21567.1 --server.sdkVersion 7.0.100-alpha.1.21568.2

Reproes only on Windows, as was confirmed before; but I don't see any "bimodal" behavior on Linux, the performance there is stable and doesn't change much between the versions. I will be investigating further.

sebastienros added the tenet-performance Performance related issue label Nov 4, 2021

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Nov 4, 2021

Tratcher added the area-System.Net.Sockets label Nov 5, 2021

karelz added this to the 7.0.0 milestone Nov 5, 2021

karelz added bug and removed untriaged New issue has not been triaged by the area owner labels Nov 5, 2021

karelz assigned CarnaViire Nov 12, 2021

sebastienros mentioned this issue Dec 21, 2021

Perf regression: connectionclosehttps, multiple_queries dotnet/aspnetcore#33165

Closed

stephentoub mentioned this issue Feb 3, 2022

Remove spin lock from SocketAsyncEventArgs on Windows #64770

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 3, 2022

stephentoub closed this as completed in #64770 Feb 8, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 8, 2022

sebastienros mentioned this issue Feb 9, 2022

Connection performance improvements dotnet/aspnetcore#39846

Closed

1 task

ghost locked as resolved and limited conversation to collaborators Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression in connections creation performance #61233

Regression in connections creation performance #61233

sebastienros commented Nov 4, 2021 •

edited

Loading

dotnet-issue-labeler bot commented Nov 4, 2021

ghost commented Nov 5, 2021

geoffkizer commented Nov 5, 2021

karelz commented Nov 5, 2021

geoffkizer commented Nov 5, 2021

sebastienros commented Nov 5, 2021

geoffkizer commented Nov 6, 2021

geoffkizer commented Nov 7, 2021

karelz commented Nov 9, 2021

CarnaViire commented Dec 3, 2021

Regression in connections creation performance #61233

Regression in connections creation performance #61233

Comments

sebastienros commented Nov 4, 2021 • edited Loading

dotnet-issue-labeler bot commented Nov 4, 2021

ghost commented Nov 5, 2021

geoffkizer commented Nov 5, 2021

karelz commented Nov 5, 2021

geoffkizer commented Nov 5, 2021

sebastienros commented Nov 5, 2021

geoffkizer commented Nov 6, 2021

geoffkizer commented Nov 7, 2021

karelz commented Nov 9, 2021

CarnaViire commented Dec 3, 2021

sebastienros commented Nov 4, 2021 •

edited

Loading