-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web UI does not stop slave servers #911
Comments
The PUSH and PULL sockets being used caused hatch messages to get routed to slaves that may have become unresponsive or crashed. This change includes the client id in the messages sent out from the master which ensures that hatch messages are going to slaves the are READY or RUNNING. This should also fix the issue locustio#911 where slaves are not receiving the stop message. I think these issues are a result of PUSH-PULL sockets using a round robin approach.
Hi @giantryansaul, I have created this PR #927 which I believe will solve the issue you are experiencing. If you have time can you test it out? Thanks! |
Thanks @Jonnymcc I don't have any time currently, but I'll see if I can get to setting up a test next week. Thanks! |
Hey @Jonnymcc, I've tried to use the branch you created but it still does not kill slaves. Not permanently, anyway. I have seen that pressing Stop will briefly stop all requests, but after a few seconds, the slaves start sending requests again. Is this the correct way to install your branch for testing?
|
* Replace zmq sockets with one DEALER-ROUTER socket The PUSH and PULL sockets being used caused hatch messages to get routed to slaves that may have become unresponsive or crashed. This change includes the client id in the messages sent out from the master which ensures that hatch messages are going to slaves the are READY or RUNNING. This should also fix the issue #911 where slaves are not receiving the stop message. I think these issues are a result of PUSH-PULL sockets using a round robin approach. * Remove client_id parameter from send_multipart method * Add heartbeat worker to server and client The server checks to see if clients have expired and if they have updates their status to "missing". The client has a worker that will send a heartbeat on a regular interval. The heart also relays the slave state back to the master so that they stay in sync. * Use new clients.all property in heartbeat worker * Fix reporting of stopped state Wait until all slaves are reporting in as ready before stating that the master is stopped. * Fix tests after changing ZMQ sockets to DEALER-ROUTER * Change heartbeat log msg to info so that it does not appear in tests * Add tests for zmqrpc.py * Remove commented imports, add note about sleep * Support str/unicode diff in py2 vs py3 * Ensure failed zmqrpc tests clean up bound sockets * Create throw away variable for identity from from ZMQ message I think this looks better than using msg[1]. * Replace usage of parse_options in tests with mock options Using parse_options during test setup can conflict with test runners like pytest. Essentially it will swallow up the options that are meant to be passed to the test runner and instead treats them as options being passed to the test. * Set coverage concurrency to gevent Coverage breaks with gevent and does not fully report green threads as having been tested. Setting concurrency in .coveragerc will fix the issue. https://bitbucket.org/ned/coveragepy/issues/149/coverage-gevent-looks-broken * Add test that shows master heartbeat worker marks slaves missing * Add assertions to test_zmqrpc.py * Use unittest assertions * Change assertion value to bytes object * Add cmdline options for heartbeat liveness and interval * Add new option heartbeat_liveness to test_runners mock options * Ensure SlaveNode class uses heartbeat_liveness default or passed * Ensure hatch data can be updated for slaves currently hatching * Add test for start hatching accepted slave states Checks that start_hatching sends messages to ready, running, and hatching slaves. * Remove unneeded imports of mock
That is odd. Looks like you are installing it the right way. In my testing, even today, I cannot replicate the problem you are seeing. Provided the stop signal is sent and the slave receives it and stops, I do not know how the tasks would start up again. Unless the master continued to send hatch jobs. What do you see in the master logs? Here is a test to see if you are using the right install. I created a master in one shell and a slave in another. Then I closed the other shell (without first stopping the slave). Eventually the master misses the heartbeats and logs the slave as disconnected.
|
I'm having the same problem, RPS drops to zero but then it starts to climb again, sometimes creating even more users than originally designated. Only "solution" I've found so far is deleting kube deployment. |
In kubernetes it could be that new pods are coming up and registering as
newly attached slaves. Not sure why the slave pods would exit though. I
thought even when they are not running they do not exit. I may be wrong.
…On Mon, May 13, 2019, 11:53 AM Lucas Abbade ***@***.***> wrote:
I'm having the same problem, RPS drops to zero but then it starts to climb
again, sometimes creating even more users than originally designated. Only
"solution" I've found so far is deleting kube deployment.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#911?email_source=notifications&email_token=AAH6JPQ4RNBWGKVD4PBPEMLPVGFGVA5CNFSM4GCLUU3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVIX7EI#issuecomment-491880337>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH6JPSFLGBFPC3XVNSDXRLPVGFGVANCNFSM4GCLUU3A>
.
|
I think you are right, the problem might be that I'm running from the Python image, and the container probably exits once the application exits, making kubernetes start another instance. I will look into this later today, thanks for the help! |
I'm still seeing this behavior when running locust 0.11.0 in docker swarm. UPDATE: I started to look further into the issue this morning. When I installed the current version of master it seems to be working as expected. |
I am also seeing this issue with Locust 0.11.0. Same as above, I am deploying the Master and Slaves on different pods on a Kubernetes cluster. |
any update? |
Im also facing this issue, any update on this |
Also seeing this issue. Quite annoying. Only way to solve it is to delete all locust pods. |
(btw) this is not limited to kubernetes deployments and the issue still persist in locust 0.11.0 |
Any update on this issue ? |
Just tested this out with v0.8.1 on my Mac with two slaves. I let it run for two hours and then stopped it and the slaves stayed stopped without sending more requests. 🤷♂ |
@Jonnymcc, the problem here is, if delete /power off the salve machines.In the locust master it still shows the 2 slaves machines in ready state. |
For me it seems to work fine with a very low number of slaves, i.e 2-6. But usually we are running 20-80 and it doesn't work at all. |
I'm seeing the same issue running locust 0.11.0 as docker containers on AWS ECS |
Did you find a solution for this problem so far? Is there a way to stop kubernetes from starting another instance? |
Updates would be posted here and the issue would be closed. |
Installing the latest master version in the dockerfile worked for me:
Background: The fix was made in #982 and the version was incremented to 0.11.1 however right now, by default pip will give you an older version without the fix. |
This hack worked for me. Also it can be done in the web ui itself. Hit 'Edit' or 'New test' below status on top right. Then make number of users=0. All the users should stop and thus the tests should also stop running. |
Just confirmed it works for me as well |
Various fixes and improvements. Including a fix for the slaves which keep sending requests when stopping the load test: locustio/locust#911
Description of issue / feature request
When clicking "Stop" on the web UI, the users on Slave servers will not stop sending requests. The master instance still says the test is in the "running" state, but the stop button has disappeared.
Expected behavior
All slave servers are stopped and the user count goes back to 0.
Actual behavior
All slave servers remain active and the count of users is over 0.
Environment settings (for bug reports)
Steps to reproduce (for bug reports)
(can't share my current code, but it is a very simple mix of get and post requests and no processing or validation of data)
The text was updated successfully, but these errors were encountered: