-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding TCP Keep Alive to guarantee master-slave communication #740
Conversation
Codecov Report
@@ Coverage Diff @@
## master #740 +/- ##
=========================================
- Coverage 65.93% 65.74% -0.2%
=========================================
Files 14 14
Lines 1374 1378 +4
Branches 214 214
=========================================
Hits 906 906
- Misses 421 425 +4
Partials 47 47
Continue to review full report at Codecov.
|
Sounds good, though I'm not 100% sure if this can have any side-effects? How does it affect Locust running on Windows? |
@@ -17,9 +17,13 @@ class Server(BaseSocket): | |||
def __init__(self, host, port): | |||
context = zmq.Context() | |||
self.receiver = context.socket(zmq.PULL) | |||
self.receiver.setsockopt(zmq.TCP_KEEPALIVE, 1) | |||
self.receiver.setsockopt(zmq.TCP_KEEPALIVE_IDLE, 30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set TCP_KEEPALIVE_IDLE
explicitly here, or could we remove it and let the OS default be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use a bit of both.
In order for Locust to function properly, you need to enable TCP Keep Alive even if it is disabled as an OS default setting.
However, the Keep Alive Idle time is a different story. In my case, my environment firewall Idle Time value was shorter than OS default, hence I needed to make it explicitly shorter.
Alternatively, we could add another optional command line parameter to set TCP_KEEPALIVE_IDLE when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! I don't see how it could have any impact on the performance so it's probably fine to set it to 30.
Unless there are cases where turning on keep-alives could have any negative effect, I'd prefer to not expose an additional command line option.
Sorry for the wait! I haven't seen any anomalies when running it on Windows. Is there any other blocker for this PR to be merged @heyman? |
@albertowar I think that a recent PR I created would also solve the issue you were tackling without having to enable TCP keep-alive #927. If you could test the changes in that PR and if they solve the issue maybe we can close this PR in favor of the newer one. According to the ZMQ guide using only one socket and sending heartbeats on it should be sufficient. "Do heartbeating on the same socket you use for messages, so your heartbeats also act as a keep-alive to stop the network connection from going stale (some firewalls can be unkind to silent connections)." |
Hey @Jonnymcc, I will this week and see if it does the trick. If it works, I am happy to take whatever gets merged first. I have been waiting for a while for this so my only concern is that the maintainers might take even longer to review your PR (given that it contains more changes). I will keep an eye on it! |
I am running distributed Locust behind a firewall that closes idle connections.
When I run tests for a long time, eventually the firewall detects that master is not sending new commands to slaves and decides to close the connection on that port.
Unfortunately, that makes me effectively lose control of the slaves and the only way to be able to gain control back is by restarting (not ideal)
I have solved this by adding TCP Keep Alive to ZMQ communications.
I recognized that it might sound a bit too specific for my case (specially if you look at the idle time value that I used). However, it is also true that Locust should do its best on preventing communication between Master and Slave from breaking.
Please share your thoughts on this!