-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve linux performance when listening on many ports #600
Conversation
@@ -49,6 +49,8 @@ public HalibutTimeoutsAndLimits Build() | |||
limits.MaximumActiveTcpConnectionsPerPollingSubscription = maximumActiveTcpConnectionsPerPollingSubscription.Value; | |||
} | |||
|
|||
limits.UseAsyncListener = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature is optional, but we are turning it on by default for our tests.
I think it makes more sense to run our tests the way they will be used by Octopus (i.e., we this off).
In saying that, it would be nice to see all the tests work with this on as well. Would it be overkill to double the number of tests and create different test cases with this on and off? 🤔
Or do we just make a few choice tests with this on for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the direction would be to have this on so if anything we would want this on now, not off.
We will need to roll this out, and step one is get signal here.
Do we want a one off run with it off to check all that behaviour remains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the plan is to move it towards being the default behaviour, then this is fine.
@@ -75,7 +75,12 @@ public static string CurrentTestHash() | |||
{ | |||
using (SHA256 mySHA256 = SHA256.Create()) | |||
{ | |||
return Convert.ToBase64String(mySHA256.ComputeHash(TestContext.CurrentContext.Test.FullName.GetUTF8Bytes())) | |||
string s = TestContext.CurrentContext.Test.FullName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this running locally, or on TeamCity?
I'm guessing it won't cause any issues running them in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both. The advantage here is net48 and net6 can run in parallel on your laptop without log file collisions.
@@ -55,7 +55,7 @@ async Task WriteDataToFile() | |||
// So what we can write it down as one chunk. | |||
while (queue.TryDequeue(out var log)) list.Add(log); | |||
|
|||
using(var fileWriter = new FileStream(logFilePath, FileMode.Append, FileAccess.Write, FileShare.Delete | FileShare.ReadWrite)) | |||
using(var fileWriter = await OpenLogFile()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, was this an issue locally or on TeamCity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
locally, my guess is crowd strike with slightly different timings (I had turbo on my laptop turned off, so I guess that increased the time files were opened for which I guess resulted in the conflict)
@@ -25,7 +25,7 @@ public class HalibutRuntime : IHalibutRuntime | |||
readonly ConcurrentDictionary<Uri, IPendingRequestQueue> queues = new(); | |||
readonly IPendingRequestQueueFactory queueFactory; | |||
readonly X509Certificate2 serverCertificate; | |||
readonly List<IDisposable> listeners = new(); | |||
readonly MutexedItem<List<IAsyncDisposable>> listeners = new MutexedItem<List<IAsyncDisposable>>(new List<IAsyncDisposable>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to verify, the motivation behind this change was to make disposal async, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We needed to make accept be async (with a cancellation token)
Which meant that to finish the accept loop task we needed to await it which meant a IAsyncDisposable.
To do that meant we needed this to be a list of IAsyncDisposable
To do that and to preserve the only one thing can access this list at a time behaviour we needed some way of locking in both async and sync context (lock
does not work with async)
To ensure that is done properly this type ensures, assuming you don't try really hard, that you can't have access to the list unless it is mutually exclusive access.
try | ||
{ | ||
#if !NETFRAMEWORK | ||
client = await listener.AcceptTcpClientAsync(this.cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand the subtlety here.
In the old one, we would wait until cts.IsCancellationRequested || listener.Pending()
before calling accept.
Using the cancellation token removes the need for cts.IsCancellationRequested
. But is waiting for listener.Pending()
important? Or does this happen for free somehow with using the async versions of AcceptTcpClientAsync
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole wait until we know something is pending was done to avoid sitting in Accept at the same time as listener.stop() being called, because a bug did (or still does honestly I can't tell) where the call to listener.stop() would hang if AcceptTcpClient was in the middle of accepting a new connection.
The big change is to remove the SpinWait that polls listener.pending()
because that is very expensive :(.
Instead we use the new (relative to when the old code was added) method that is AcceptTcpClientAsync
with a cancellation token. When we shut down on linux the order is always:
- Cancel the CTS
- Wait for the loop to exit (which it does)
- then and only then stop the listener (at this point we know we are NOT in accept and so can not hang).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
I'd like to understand the comment in SecureListener
regarding listener.Pending()
before approval though.
Background
While trying to get 1000s of listening simulated tentacles working, I was running into issues in which Octopus was unable to communicate with the Tentacle. Eventually I discovered that the CPU usage of Tentacle Simulator was high despite it not doing anything. Turns out it was spending all its time in
SpinWait
.This was done because
TcpListener.Stop hangs if Accept is in progress
dotnet/runtime#24513 . Which is claimed to be fixed, although I see an absolute disaster of reverts so it is not clear if it is fixed.Working on the assumption
TcpListener.Stop
still hangs if Accept is in progress, the new optional path is to:SecureListener
is disposed first the CancellationToken is cancelled, then we wait for the accept task to complete (which will be some sort cancellation exception), after that when we know we are not in the middle of Accept we then callTcpListener.Stop
.Note that the new behaviour must be opted in by setting:
on
HalibutTimeoutsAndLimits
.Results
Before
CPU usage was half my cores when listening on 1000 sockets.
After
CPU usage is near nothing.
Tentacle Simulator can actually be used with 1000s of listening tentacles.
How to review this PR
Quality ✔️
Pre-requisites