Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid allocating SocketAddress GCHandle for every ReceiveFromAsync #89808

Merged
merged 4 commits into from
Aug 2, 2023

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Aug 1, 2023

fixes #86513
contributes to #30797

Because we always serialize EndPoint to SocketAddress this check is never true:

// Check if already pinned.
if (_pinnedSocketAddress == _socketAddress)

So we and up creating GCHandle for every packet making all the logic useless.
The fix I discussed with @stephentoub is to use NativeMemory that is quite fast and same memory chunk can persist across many recipes regardless of the actually SocketAddress.
As downside we need to copy the actual socket address bytes - but it is small so it should be fast and it does not got memory with movable blocks.

Perftests do not really show much because they do not track the GCHandles.

Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
ConnectAcceptAsync Job-SWLAAA \8.0.0\corerun.exe 4.238 ms 1.2878 ms 1.4830 ms 3.928 ms 1.816 ms 7.023 ms 1.00 0.00 1343 B 1.00
ConnectAcceptAsync Job-DYCDTR \main\corerun.exe 3.341 ms 0.7778 ms 0.8646 ms 3.351 ms 1.827 ms 5.361 ms 0.88 0.35 1368 B 1.02
SendAsyncThenReceiveAsync_Task Job-SWLAAA \8.0.0\corerun.exe 255.208 ms 15.2474 ms 17.5590 ms 256.833 ms 215.582 ms 280.909 ms 1.00 0.00 - NA
SendAsyncThenReceiveAsync_Task Job-DYCDTR \main\corerun.exe 250.302 ms 10.1277 ms 10.8365 ms 251.113 ms 214.067 ms 268.417 ms 0.98 0.08 - NA
ReceiveAsyncThenSendAsync_Task Job-SWLAAA \8.0.0\corerun.exe 2,540.331 ms 471.9609 ms 543.5108 ms 2,570.356 ms 1,353.512 ms 3,550.914 ms 1.00 0.00 - NA
ReceiveAsyncThenSendAsync_Task Job-DYCDTR \main\corerun.exe 2,179.767 ms 330.8853 ms 381.0480 ms 2,151.268 ms 1,419.066 ms 2,880.350 ms 0.91 0.31 - NA
SendAsyncThenReceiveAsync_SocketAsyncEventArgs Job-SWLAAA \8.0.0\corerun.exe 204.784 ms 41.5151 ms 47.8089 ms 220.779 ms 142.865 ms 279.819 ms 1.00 0.00 - NA
SendAsyncThenReceiveAsync_SocketAsyncEventArgs Job-DYCDTR \main\corerun.exe 177.627 ms 29.7090 ms 33.0214 ms 191.337 ms 96.820 ms 208.411 ms 0.87 0.17 - NA
ReceiveAsyncThenSendAsync_SocketAsyncEventArgs Job-SWLAAA \8.0.0\corerun.exe 1,899.842 ms 463.4884 ms 533.7539 ms 1,925.979 ms 636.953 ms 2,709.015 ms 1.00 0.00 - NA
ReceiveAsyncThenSendAsync_SocketAsyncEventArgs Job-DYCDTR \main\corerun.exe 2,335.415 ms 340.3239 ms 391.9175 ms 2,393.339 ms 1,767.759 ms 3,198.932 ms 1.43 0.85 - NA
ReceiveFromAsyncThenSendToAsync_Task Job-SWLAAA \8.0.0\corerun.exe 3,382.587 ms 868.8274 ms 1,000.5430 ms 3,373.508 ms 1,219.412 ms 4,980.546 ms 1.00 0.00 5761064 B 1.00
ReceiveFromAsyncThenSendToAsync_Task Job-DYCDTR \main\corerun.exe 5,021.543 ms 272.1956 ms 313.4609 ms 5,057.454 ms 3,903.799 ms 5,358.622 ms 1.69 0.83 6241688 B 1.08
SendToThenReceiveFrom Job-SWLAAA \8.0.0\corerun.exe 368.134 ms 15.5442 ms 17.9007 ms 367.633 ms 319.479 ms 390.570 ms 1.00 0.00 5760784 B 1.00
SendToThenReceiveFrom Job-DYCDTR \main\corerun.exe 341.450 ms 34.0893 ms 39.2573 ms 352.988 ms 231.940 ms 386.746 ms 0.93 0.10 6240784 B 1.08

@wfurt wfurt added this to the 8.0.0 milestone Aug 1, 2023
@wfurt wfurt self-assigned this Aug 1, 2023
@ghost
Copy link

ghost commented Aug 1, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

fixes #86513
contributes to #30797

Because we always serialize EndPoint to SocketAddress this check is never true:

// Check if already pinned.
if (_pinnedSocketAddress == _socketAddress)

So we and up creating GCHandle for every packet making all the logic useless.
The fix I discussed with @stephentoub is to use NativeMemory that is quite fast and same memory chunk can persist across many recipes regardless of the actually SocketAddress.
As downside we need to copy the actual socket address bytes - but it is small so it should be fast and it does not got memory with movable blocks.

Perftests do not really show much because they do not track the GCHandles.

Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
ConnectAcceptAsync Job-SWLAAA \8.0.0\corerun.exe 4.238 ms 1.2878 ms 1.4830 ms 3.928 ms 1.816 ms 7.023 ms 1.00 0.00 1343 B 1.00
ConnectAcceptAsync Job-DYCDTR \main\corerun.exe 3.341 ms 0.7778 ms 0.8646 ms 3.351 ms 1.827 ms 5.361 ms 0.88 0.35 1368 B 1.02
SendAsyncThenReceiveAsync_Task Job-SWLAAA \8.0.0\corerun.exe 255.208 ms 15.2474 ms 17.5590 ms 256.833 ms 215.582 ms 280.909 ms 1.00 0.00 - NA
SendAsyncThenReceiveAsync_Task Job-DYCDTR \main\corerun.exe 250.302 ms 10.1277 ms 10.8365 ms 251.113 ms 214.067 ms 268.417 ms 0.98 0.08 - NA
ReceiveAsyncThenSendAsync_Task Job-SWLAAA \8.0.0\corerun.exe 2,540.331 ms 471.9609 ms 543.5108 ms 2,570.356 ms 1,353.512 ms 3,550.914 ms 1.00 0.00 - NA
ReceiveAsyncThenSendAsync_Task Job-DYCDTR \main\corerun.exe 2,179.767 ms 330.8853 ms 381.0480 ms 2,151.268 ms 1,419.066 ms 2,880.350 ms 0.91 0.31 - NA
SendAsyncThenReceiveAsync_SocketAsyncEventArgs Job-SWLAAA \8.0.0\corerun.exe 204.784 ms 41.5151 ms 47.8089 ms 220.779 ms 142.865 ms 279.819 ms 1.00 0.00 - NA
SendAsyncThenReceiveAsync_SocketAsyncEventArgs Job-DYCDTR \main\corerun.exe 177.627 ms 29.7090 ms 33.0214 ms 191.337 ms 96.820 ms 208.411 ms 0.87 0.17 - NA
ReceiveAsyncThenSendAsync_SocketAsyncEventArgs Job-SWLAAA \8.0.0\corerun.exe 1,899.842 ms 463.4884 ms 533.7539 ms 1,925.979 ms 636.953 ms 2,709.015 ms 1.00 0.00 - NA
ReceiveAsyncThenSendAsync_SocketAsyncEventArgs Job-DYCDTR \main\corerun.exe 2,335.415 ms 340.3239 ms 391.9175 ms 2,393.339 ms 1,767.759 ms 3,198.932 ms 1.43 0.85 - NA
ReceiveFromAsyncThenSendToAsync_Task Job-SWLAAA \8.0.0\corerun.exe 3,382.587 ms 868.8274 ms 1,000.5430 ms 3,373.508 ms 1,219.412 ms 4,980.546 ms 1.00 0.00 5761064 B 1.00
ReceiveFromAsyncThenSendToAsync_Task Job-DYCDTR \main\corerun.exe 5,021.543 ms 272.1956 ms 313.4609 ms 5,057.454 ms 3,903.799 ms 5,358.622 ms 1.69 0.83 6241688 B 1.08
SendToThenReceiveFrom Job-SWLAAA \8.0.0\corerun.exe 368.134 ms 15.5442 ms 17.9007 ms 367.633 ms 319.479 ms 390.570 ms 1.00 0.00 5760784 B 1.00
SendToThenReceiveFrom Job-DYCDTR \main\corerun.exe 341.450 ms 34.0893 ms 39.2573 ms 352.988 ms 231.940 ms 386.746 ms 0.93 0.10 6240784 B 1.08
Author: wfurt
Assignees: wfurt
Labels:

area-System.Net.Sockets, os-windows

Milestone: 8.0.0

@wfurt
Copy link
Member Author

wfurt commented Aug 1, 2023

I updated names and comments @stephentoub

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Did you happen to measure any impact to just allocating and freeing on each operation? Would that clean up any code / make the SAEA smaller?

@wfurt
Copy link
Member Author

wfurt commented Aug 1, 2023

Did you happen to measure any impact to just allocating and freeing on each operation? Would that clean up any code / make the SAEA smaller?

no, but it would be easy to try. My guess is that the impact would be small - in perf tests even voiding GCHandle was small -> most benefit come from memory management IMHO

@wfurt wfurt merged commit 42c4ed5 into dotnet:main Aug 2, 2023
@wfurt wfurt deleted the gchandle3 branch August 2, 2023 03:14
@ghost ghost locked as resolved and limited conversation to collaborators Sep 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnecesary use of GCHandle in Windows Socket code
2 participants