Supplying data centrally. Coordinating in cluster mode. #1506

mingp · 2020-08-07T18:15:30Z

I'm unsure if this is a request for a new feature or an inquiry about how to use existing features. I would be happy with either outcome.

Thanks in advance.

Is your feature request related to a problem? Please describe.

I need to either supply data centrally to individual locusts or coordinate between locusts in cluster mode.

The canonical use-case is as follows. Suppose that I have a limited set of test user accounts, that I need to distribute between my locusts in cluster mode. I have more test user accounts than locusts, but not necessarily by that large of a margin. I would ideally like to hand them out sequentially, round-robin.

Random assignment, where each locust randomly selects from the whole list, by and large works, but has significant drawbacks. In particular, if the set of available test accounts is not much larger than the set of locusts, then we start to get collisions, where multiple locusts pull the same test account, at a rate determined by the Birthday Paradox. This is bad if, for example, running multiple concurrent test scripts against the same test account would cause test script failure or potentially even leave the test account wedged in an invalid state.

Describe the solution you'd like

There should ideally be a way from the central coordinator, ideally via some sort of hook that the test script can add, to hand out data to individual locusts in cluster mode. Given that the central coordinator already needs to send start and stop requests to locusts, it seems like this data can piggy-back onto existing calls.

Notable prior art comes from Gatling, a competing load-testing framework, which has a feature that it calls feeders, by which data can be centrally injected into individual sessions. IMO, something similar could benefit Locust as well.

Describe alternatives you've considered

Random assignment, as detailed above. It works, but has caveats, and the failure mode can be situationally catastrophic, as detailed above.
An external API endpoint handles this logic. The load test script calls the API endpoint to fetch parameters for the current run. It works, but is quite clunky, to have to spin up additional seemingly unrelated infrastructure.

Additional context

N/A.

cyberw · 2020-08-07T20:43:16Z

Hi! I think this would be nice but need a lot of work. Personally, I keep this sort of data in a MongoDb which is good at keeping track or multiple readers/updaters, see locust-plugins, specifically MongoReader https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/mongoreader_ex.py

max-rocket-internet · 2020-08-10T08:49:29Z

It wouldn't be super hard to implement this yourself. You could simply retrieve the "users accounts" in on_start from somewhere else. MongoDB could work but another idea could be Redis as it has Distributed locks which could ensure the same user is not given out twice 🙂

cyberw · 2020-08-10T09:29:42Z

It wouldn't be super hard to implement this yourself. You could simply retrieve the "users accounts" in on_start from somewhere else. MongoDB could work but another idea could be Redis as it has Distributed locks which could ensure the same user is not given out twice 🙂

MongoReader/MongoDB does atomic writes, so there is no risk of the same user being given out twice (maybe not as efficiently as redis does it, but efficient enough for most purposes)

mingp · 2020-08-10T15:17:22Z

@cyberw @max-rocket-internet -

Thank you for the replies. I appreciate the suggestion, and indeed e.g. MongoDB or Redis seems quite interesting. The main drawback that I see there would be the same as e.g. an external API endpoint, in that it introduces an additional dependency that we would have to maintain. For context, load testing is something that our team does as needed, but not too frequently, and so we try to reduce the set of dependencies, especially external.

What I was really hoping for was that, given that Locust in cluster mode already needs to communicate between the coordinator and locusts, to piggyback more information off of that message, so that this could be done without additional dependencies. Is there any sort of integration point, even if unofficial, by which that may happen? If so, please let me know.

Thank you again.

cyberw · 2020-08-10T15:29:25Z

Is there any sort of integration point, even if unofficial, by which that may happen?

Unfortunately I dont think there is (at least not an obvious one). Maybe @heyman has some input on where something like this could be jacked in.

I think it could be very useful (but it also increases complexity a little)

aborichev · 2020-09-03T12:07:51Z

Master node seems to be overloaded by test stats when 30k rps reached. It seems data provider is completely different role, so test data provider should be better the standalone tool. I use Redis for this purpose, but there are a lot of other DBs or tools.

mingp · 2020-09-09T22:15:07Z

Understandable that this probably should not exist by default, as not everyone needs it. That said, I'd still really appreciate if a proper set of hooks existed for this, so that those who want it can write the appropriate hooks code to make it happen.

gclair · 2020-09-21T17:24:09Z

Throwing in my thumbs up for this as well. Even something along the lines of an incrementing worker ID value would be helpful. At this point i'm planning on a simple REST server in go/python to just spit out a value that I require on a call to it. It would be more simple than spinning up redis/mongo for a simple queue solution.

github-actions · 2021-04-11T20:23:30Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2021-04-22T01:54:52Z

This issue was closed because it has been stalled for 10 days with no activity.

pm-komal-jain · 2021-04-29T11:47:09Z

Throwing in my thumbs up for this as well. Even something along the lines of an incrementing worker ID value would be helpful. At this point i'm planning on a simple REST server in go/python to just spit out a value that I require on a call to it. It would be more simple than spinning up redis/mongo for a simple queue solution.

How is this implemented? Curious to find a little more. We are facing same issue, to ensure, unique data across slaves. And want to implement a quick and light weight solution.

mingp added the feature request label Aug 7, 2020

cyberw added hacktoberfest See https://hacktoberfest.digitalocean.com for more info non-critical and removed hacktoberfest See https://hacktoberfest.digitalocean.com for more info labels Sep 28, 2020

github-actions bot added the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Apr 11, 2021

github-actions bot closed this as completed Apr 22, 2021

This was referenced Jun 7, 2021

Allow master node to supply data to worker nodes directly #1780

Closed

Allow cross process communication using custom messages #1782

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supplying data centrally. Coordinating in cluster mode. #1506

Supplying data centrally. Coordinating in cluster mode. #1506

mingp commented Aug 7, 2020

cyberw commented Aug 7, 2020 •

edited

Loading

max-rocket-internet commented Aug 10, 2020

cyberw commented Aug 10, 2020 •

edited

Loading

mingp commented Aug 10, 2020

cyberw commented Aug 10, 2020 •

edited

Loading

aborichev commented Sep 3, 2020

mingp commented Sep 9, 2020

gclair commented Sep 21, 2020

github-actions bot commented Apr 11, 2021

github-actions bot commented Apr 22, 2021

pm-komal-jain commented Apr 29, 2021

Supplying data centrally. Coordinating in cluster mode. #1506

Supplying data centrally. Coordinating in cluster mode. #1506

Comments

mingp commented Aug 7, 2020

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

cyberw commented Aug 7, 2020 • edited Loading

max-rocket-internet commented Aug 10, 2020

cyberw commented Aug 10, 2020 • edited Loading

mingp commented Aug 10, 2020

cyberw commented Aug 10, 2020 • edited Loading

aborichev commented Sep 3, 2020

mingp commented Sep 9, 2020

gclair commented Sep 21, 2020

github-actions bot commented Apr 11, 2021

github-actions bot commented Apr 22, 2021

pm-komal-jain commented Apr 29, 2021

cyberw commented Aug 7, 2020 •

edited

Loading

cyberw commented Aug 10, 2020 •

edited

Loading

cyberw commented Aug 10, 2020 •

edited

Loading