Response time graph does not work for distributed configuration #1984

hmliu6 · 2022-01-27T04:54:59Z

Describe the bug

The response time graph is only available for first few seconds, and shows no data afterward.
The statistics page shows that all response time data are normal.
The JSON returned from master shows that the current_response_time_xxxx fields are in null

The symptome looks exactly the same as this old issue #1182

Expected behavior

The response time should keep displaying and not in no data.

Actual behavior

No explicit error log or exception found on the locust master log and also the browser console log.

Steps to reproduce

Update locust to latest version
Deploy in master workers configuration on kubernetes

Environment

OS: Centos 7
Python version: Python 3.6.8
Locust version: 2.6.0

cyberw · 2022-01-27T07:53:23Z

The last log line indicates that your workers are overloaded. Can you share your locustfile? Are you maybe doing some heavy calculations or using a bad (not gevent-friendly) client?

hmliu6 · 2022-01-27T08:20:57Z

@cyberw, thanks for your reply and sorry that I cannot simply share the locust file.
The logic inside is that the user will download a xml to parse it and then download another file inside the xml.
I believe the overloaded worker is due to large number of users are spawned.
But this UI graph issue was happened when we upgrade the version from 1.0.3 to 2.6.0 (latest), and rest of the locustfile did not change.

cyberw · 2022-01-27T08:28:46Z

XML parsing may be a little too heavy to do in-process. See if you can do the loading of xml without blocking the worker, or maybe switch to a faster xml parser. Maybe locust has gotten more sensitive to worker overload, but without more info it is impossible to know. Sorry, but you are on your own here...

hmliu6 · 2022-03-19T06:19:05Z

Just would like to keep this issue updated with the root cause found earlier and close this issue.
It turns out that it is similar as this stackoverflow question.
https://stackoverflow.com/questions/63287818/load-generated-by-distributed-worker-processes-is-not-equivalent-generated-by-si

I believe that the problem here is mainly about the master VM which is still unclear why it is caused and some of the communication from worker are being dropped, so the statistics is not accurate.
We used the same set of scripts in many use cases like other on-prem clusters and cloud clusters which has no issue on the statistics, but only found issue on this weird on-prem machine.

The solution afterward is that someone was upgraded the BIOS for this on-prem cluster and restarted all machines in cluster, then the whole thing became normal...

hmliu6 added the bug label Jan 27, 2022

hmliu6 closed this as completed Mar 19, 2022

cyberw added the wontfix label Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Response time graph does not work for distributed configuration #1984

Response time graph does not work for distributed configuration #1984

hmliu6 commented Jan 27, 2022

cyberw commented Jan 27, 2022 •

edited

Loading

hmliu6 commented Jan 27, 2022

cyberw commented Jan 27, 2022 •

edited

Loading

hmliu6 commented Mar 19, 2022

Response time graph does not work for distributed configuration #1984

Response time graph does not work for distributed configuration #1984

Comments

hmliu6 commented Jan 27, 2022

Describe the bug

Expected behavior

Actual behavior

Steps to reproduce

Environment

cyberw commented Jan 27, 2022 • edited Loading

hmliu6 commented Jan 27, 2022

cyberw commented Jan 27, 2022 • edited Loading

hmliu6 commented Mar 19, 2022

cyberw commented Jan 27, 2022 •

edited

Loading

cyberw commented Jan 27, 2022 •

edited

Loading