Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers go down with message: failed to send heartbeat, setting state to missing. #1843

Closed
roquemoyano-tc opened this issue Aug 6, 2021 · 12 comments

Comments

@roquemoyano-tc
Copy link

Describe the bug

every time that I run locust I'm always getting: locust-worker-xxxx failed to send heartbeat, setting state to missing.

Expected behavior

worker in running status

Actual behavior

I'm using the oficial helm chart to run locust, when I run the python code after some time the workers changes their status to missing and I'm not able to finish running the test.

Environment

  • OS: AKS
  • Python version: python version included in helm chart
  • Locust version: 1.5.0* to 2.0.0*
  • Locust command line that you ran: I open the Gui to run it
@mboutet
Copy link
Contributor

mboutet commented Aug 6, 2021

@roquemoyano-tc You need to provide the complete logs from both the master and at least one of the workers. Share them as gists as to not paste wall of logs in this issue.

@roquemoyano-tc
Copy link
Author

sorry, in this link are the master and worker logs

https://gist.github.com/roquemoyano-tc/f23dc4a4c8c17da30fa6c101c55a0ad9

@cyberw
Copy link
Collaborator

cyberw commented Aug 11, 2021

Can you explain the "custom" entries in the slave log?
What is this for example "Adding new paired device"

Also attach your locustfile.

@amaanupstox
Copy link

@cyberw
#!interpreter [optional-arg]

class GRPCBackOfficeClient:
@stopwatch
def email_mobile_list(self):
    try:
        response = backoffice_customer_profile_service.get_email_mobile_list()
        assert response.metadata.success, "response should be true"

    except (KeyboardInterrupt, SystemExit):
        logging.error("Interrupted by keyboard............")
        sys.exit(0)

class GRPCBackOfficeLocust(FastHttpUser):
host = "https://{}".format(utils.get_properties("backoffice-service", "url"))
grpc_backoffice_client = GRPCBackOfficeClient()
wait_time = constant(0)

def on_start(self):
    """ on_start is called when a Locust start before any task is scheduled """
    pass

def on_stop(self):
    """ on_stop is called when the TaskSet is stopping """
    pass

@task
def family_client_group(self):
    """ To load test family client group gRPC call"""
    self.grpc_backoffice_client.family_client_group()

@amaanupstox
Copy link

amaanupstox commented Aug 11, 2021

@cyberw @mboutet pls help here, working fine with one slave, when i attach the second one: there will be an error called "failed to send heartbeat, setting state to missing."
Locust: 1.4.3
Python: 3.7.8

@elizabeth-tran
Copy link

Will attach the locust file soon.
The custom entries in the logs are just print statements from the different tasks that are being executed on the slave.
The logic being using in the scripts is related to https://medium.com/locust-io-experiments/locust-experiments-feeding-the-locusts-cf09e0f65897 because I'm feeding the slaves with information about existing users from a csv file read in from the master.

@cyberw
Copy link
Collaborator

cyberw commented Aug 11, 2021

@cyberw @mboutet pls help here, working fine with one slave, when i attach the second one: there will be an error called "failed to send heartbeat, setting state to missing."
Locust: 1.4.3
Python: 3.7.8

Start by updating locust. I dont know a specific bug in this area, but I dont want to spend time solving what might already have been solved :)

@cyberw
Copy link
Collaborator

cyberw commented Aug 11, 2021

Will attach the locust file soon.
The custom entries in the logs are just print statements from the different tasks that are being executed on the slave.
The logic being using in the scripts is related to https://medium.com/locust-io-experiments/locust-experiments-feeding-the-locusts-cf09e0f65897 because I'm feeding the slaves with information about existing users from a csv file read in from the master.

Ah, that might be key. That blog post speaks specifically about passing extra info from master to slave, and it could (theoretically at least) interrupt the normal locust communication. There is a new way to do that, using custom messages https://docs.locust.io/en/stable/running-locust-distributed.html?highlight=custom#communicating-across-nodes

Try that instead.

@cyberw
Copy link
Collaborator

cyberw commented Aug 11, 2021

@amaanupstox You're using a grpcclient? Make sure you have patched it to be gevent-friendly, like in the example in the docs: https://docs.locust.io/en/latest/testing-other-systems.html#example-writing-a-grpc-user-client

@amaanupstox
Copy link

amaanupstox commented Aug 12, 2021

@cyberw can you please let me know which server here is

def start_server():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    hello_pb2_grpc.add_HelloServiceServicer_to_server(HelloServiceServicer(), server)
    server.add_insecure_port("localhost:50051")
    server.start()
    logger.info("gRPC server started")
    server.wait_for_termination()

locust server or gRPC service hosted server?

@cyberw
Copy link
Collaborator

cyberw commented Aug 12, 2021

That is the grpc service used as a dummy target for the test. It is not something you would launch in a real test.

@elizabeth-tran
Copy link

The original issue was resolved once I switched to using custom messages. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants