-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding TCP Keep Alive to guarantee master-slave communication after idle periods #1020
Conversation
…ests work without adding a dependency on having a working C compiler environment set up when installing Locust (since geventhttpclient package doesn't have wheels).
…. The text property function tries to decode the text in the same way as python-requests. This makes the API more similar to HttpSession. Added more tests.
… a trailing slash
… requests with FastHttpSession, in order to manually control if a request should result in a success or failure
Improved documentation of arguments for FastHttpSession.request() method.
… reported as failures
# Conflicts: # docs/writing-a-locustfile.rst
…the response was very quick (not a failure) and that 0.0 was a valid response time for a local server. Because of this i changes it to assertGreaterEqual.
@albertowar Is this statement from the ZMQ guide not accurate then? |
I'd say the recommendation from the guide (http://zguide.zeromq.org/php:chapter4) is good. However, the current implementation of heartbeats in Locust is not cutting it for this particular use case (private clouds).
Adding TCP Keep Alive seems trivial enough and I personally have been operating on my fork (which includes this change) for over a year without any issues. |
@Jonnymcc, after reading the code a bit more, I think we can confirm what you suspected: https://github.com/locustio/locust/blob/master/locust/runners.py#L420 The |
So add a pong to the slave's ping and all is well?
…On Wed, Jun 19, 2019, 11:34 AM Alberto ***@***.***> wrote:
@Jonnymcc <https://github.com/Jonnymcc>, after reading the code a bit
more, I think we can confirm what you suspected:
https://github.com/locustio/locust/blob/master/locust/runners.py#L420
The slaves are reporting heartbeats to the master but the master is not
reporting heartbeats to the slaves. Given that the communication occurs in
different ports, one of the connection is kept alive by the heartbeats but
the other one is being closed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1020?email_source=notifications&email_token=AAH6JPRIXT7TQFESOE2MLJTP3JGZ3A5CNFSM4HQNZOZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCIK2Q#issuecomment-503612778>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH6JPW34QM6CW2JIN5RLVTP3JGZ3ANCNFSM4HQNZOZQ>
.
|
That's definitely an approach to solve it and it could work (I would have to test it to be 100% sure). However, I am not sure if that's the most optimal way to do it. IIRC, the current This scenario is a bit different because I only want to ensure the connection Implementing this at the application level (with heartbeats) is a CPU overhead (another thread doing something in the background) that is not yet required. It could be handy if we ever wanted the slaves to react to a dead master, but that's not a feature people have interest in (as far as I know). If we use TCP Keep Alive, it will be handled at the transport layer, which should be more efficient CPU-wise. On top of that, I have been using this for a long time and at scale (80 slaves+), hence I am confident this solves the problem I highlighted. |
Discovered via: __flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics__ Legacy __print__ statements are syntax errors in Python 3 but __print()__ function works as expected in both Python 2 and Python 3.
__sudo: required__ no longer is... [Travis are now recommending removing the __sudo__ tag](https://blog.travis-ci.com/2018-11-19-required-linux-infrastructure-migration). "_If you currently specify __sudo:__ in your __.travis.yml__, we recommend removing that configuration_"
…need to remove it explicitly.
The fail percentage was calculated incorrectly. Something that failed all of the time would be reported as failing 50% total.
…en-running-distributed Ensure that the last samples get sent by slave and received by master.
Using docker multi-stage for 50% smaller image
FastHttpLocust
@mbeacom, if you have time, would you mind reviewing this one as well? |
@albertowar It appears you have a wonky rebase. Any chance you could squash commits here or cherry pick to another branch? |
Will do that in a separate PR then! |
Hi Locust folks!
This PR is a revive of a previous PR 740 from about a year ago.
For context, the cloud infrastructure my company uses has a firewall that closes idle connections after 5 minutes. For us, that means that if we leave a test running for a while and come back to it, we are unable to control the slaves through the master UI.
If you look through the comments PR 740, you can see that this is easily fixed by adding keep alive at the ZMQ socket level.
We thought that PR 927 would make the change redundant but, after testing Locust 0.11.0 (which includes the change) we were able to reproduce the problem again pretty consistently.
The change got reviewed by @heyman and I believe his concerns are addressed.
Thanks for your time and sorry for the confusion!