-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in connections creation performance #61233
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @dotnet/ncl Issue DetailsI detected a regression due to this PR #53340 on "connection close" scenarios where we measure the speed of creating connections. It was discovered while investigating ssl handshakes performance comparisons with HttpSys. We discovered the difference was not due to the handshake itself. On a 12 v-cores windows machines (aspnet-perf-win) it reduces RPS from 14K to 5K and latency goes from 2ms to 6ms. The load generator sends a I can gather traces for the runs before and after the change. Let me know if you want to see ThreadTime/NetworkTCPIP in these since it impacts the traces size and perf differences. These benchmarks are not CPU bound and show a lot of BLOCKED time. The NetworkTCPIP stack might be unnecessary at this level.
|
Wow, that's a big regression. The linked PR did shuffle around some of the internal handling here, so perhaps that introduced an issue. Since we are seeing a lot of BLOCKED time, it sounds like we have some sort of contention issue. Can you tell from the traces where the contention is occurring? I looked at the code, but don't see anything obvious here; there are a bunch of interlocked ops but no explicit blocking that I see. |
Given that this is non-trivial regression, we should fix it sooner in 7.0 and then consider backporting it to 6.0 based on fix risk and assessed + observed impact on customers. |
I did notice one issue that could cause increased memory usage. Fix is here: #61258 @sebastienros Can you try that out and see if it helps? This seems like a small issue so I doubt it would be the root cause here, but it's worth a shot. |
Crank command lines to repro and get traces: Before (build
After (build
To remove ThreadTime from traces as it makes them much bigger, use this argument and remove
To test local changes add this argument:
Note that collecting a trace impacts the RPS significantly, so to compare local changes impact I recommend to disable it. |
I did a run with #61258 and it does not seem to be the root cause here. Further investigation is needed. |
I did some runs on Linux (aspnet-perf-lin) and I see bimodal behavior there:
This is weird and should be investigated further, but it's not a regression. Windows results seem to be consistent, with Before around 17k and After around 6-7k. However, these results are similar enough to the two modes above that I wonder if something inadvertantly triggered the Windows behavior to switch from being consistently in the good mode, to consistently in the bad mode. More investigation is needed here. |
Triage: The PR shouldn't have caused this type of regression. Next step: Create micro-benchmark here (e.g. remove ASP.NET from the repro). |
I am able to reproduce that on pure socket code (on crank); the code is here aspnet/Benchmarks@main...CarnaViire:socket-benchmark
Reproes only on Windows, as was confirmed before; but I don't see any "bimodal" behavior on Linux, the performance there is stable and doesn't change much between the versions. I will be investigating further. |
I detected a regression due to this PR #53340 on "connection close" scenarios where we measure the speed of creating connections. It was discovered while investigating ssl handshakes performance comparisons with HttpSys. We discovered the difference was not due to the handshake itself.
On a 12 v-cores windows machines (aspnet-perf-win) it reduces RPS from 14K to 5K and latency goes from 2ms to 6ms. The load generator sends a
Connection: close
header to force new connections on every request. Can also be measured on Linux.I can gather traces for the runs before and after the change. Let me know if you want to see ThreadTime/NetworkTCPIP in these since it impacts the traces size and perf differences. These benchmarks are not CPU bound and show a lot of BLOCKED time. The NetworkTCPIP stack might be unnecessary at this level.
@karelz @halter73 @geoffkizer
The text was updated successfully, but these errors were encountered: