Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supplying data centrally. Coordinating in cluster mode. #1506

Closed
mingp opened this issue Aug 7, 2020 · 11 comments
Closed

Supplying data centrally. Coordinating in cluster mode. #1506

mingp opened this issue Aug 7, 2020 · 11 comments
Labels
feature request non-critical stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it

Comments

@mingp
Copy link

mingp commented Aug 7, 2020

I'm unsure if this is a request for a new feature or an inquiry about how to use existing features. I would be happy with either outcome.

Thanks in advance.

Is your feature request related to a problem? Please describe.

I need to either supply data centrally to individual locusts or coordinate between locusts in cluster mode.

The canonical use-case is as follows. Suppose that I have a limited set of test user accounts, that I need to distribute between my locusts in cluster mode. I have more test user accounts than locusts, but not necessarily by that large of a margin. I would ideally like to hand them out sequentially, round-robin.

Random assignment, where each locust randomly selects from the whole list, by and large works, but has significant drawbacks. In particular, if the set of available test accounts is not much larger than the set of locusts, then we start to get collisions, where multiple locusts pull the same test account, at a rate determined by the Birthday Paradox. This is bad if, for example, running multiple concurrent test scripts against the same test account would cause test script failure or potentially even leave the test account wedged in an invalid state.

Describe the solution you'd like

There should ideally be a way from the central coordinator, ideally via some sort of hook that the test script can add, to hand out data to individual locusts in cluster mode. Given that the central coordinator already needs to send start and stop requests to locusts, it seems like this data can piggy-back onto existing calls.

Notable prior art comes from Gatling, a competing load-testing framework, which has a feature that it calls feeders, by which data can be centrally injected into individual sessions. IMO, something similar could benefit Locust as well.

Describe alternatives you've considered

  1. Random assignment, as detailed above. It works, but has caveats, and the failure mode can be situationally catastrophic, as detailed above.
  2. An external API endpoint handles this logic. The load test script calls the API endpoint to fetch parameters for the current run. It works, but is quite clunky, to have to spin up additional seemingly unrelated infrastructure.

Additional context

N/A.

@cyberw
Copy link
Collaborator

cyberw commented Aug 7, 2020

Hi! I think this would be nice but need a lot of work. Personally, I keep this sort of data in a MongoDb which is good at keeping track or multiple readers/updaters, see locust-plugins, specifically MongoReader https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/mongoreader_ex.py

@max-rocket-internet
Copy link
Contributor

It wouldn't be super hard to implement this yourself. You could simply retrieve the "users accounts" in on_start from somewhere else. MongoDB could work but another idea could be Redis as it has Distributed locks which could ensure the same user is not given out twice 🙂

@cyberw
Copy link
Collaborator

cyberw commented Aug 10, 2020

It wouldn't be super hard to implement this yourself. You could simply retrieve the "users accounts" in on_start from somewhere else. MongoDB could work but another idea could be Redis as it has Distributed locks which could ensure the same user is not given out twice 🙂

MongoReader/MongoDB does atomic writes, so there is no risk of the same user being given out twice (maybe not as efficiently as redis does it, but efficient enough for most purposes)

@mingp
Copy link
Author

mingp commented Aug 10, 2020

@cyberw @max-rocket-internet -

Thank you for the replies. I appreciate the suggestion, and indeed e.g. MongoDB or Redis seems quite interesting. The main drawback that I see there would be the same as e.g. an external API endpoint, in that it introduces an additional dependency that we would have to maintain. For context, load testing is something that our team does as needed, but not too frequently, and so we try to reduce the set of dependencies, especially external.

What I was really hoping for was that, given that Locust in cluster mode already needs to communicate between the coordinator and locusts, to piggyback more information off of that message, so that this could be done without additional dependencies. Is there any sort of integration point, even if unofficial, by which that may happen? If so, please let me know.

Thank you again.

@cyberw
Copy link
Collaborator

cyberw commented Aug 10, 2020

Is there any sort of integration point, even if unofficial, by which that may happen?

Unfortunately I dont think there is (at least not an obvious one). Maybe @heyman has some input on where something like this could be jacked in.

I think it could be very useful (but it also increases complexity a little)

@aborichev
Copy link

Master node seems to be overloaded by test stats when 30k rps reached. It seems data provider is completely different role, so test data provider should be better the standalone tool. I use Redis for this purpose, but there are a lot of other DBs or tools.

@mingp
Copy link
Author

mingp commented Sep 9, 2020

Understandable that this probably should not exist by default, as not everyone needs it. That said, I'd still really appreciate if a proper set of hooks existed for this, so that those who want it can write the appropriate hooks code to make it happen.

@gclair
Copy link

gclair commented Sep 21, 2020

Throwing in my thumbs up for this as well. Even something along the lines of an incrementing worker ID value would be helpful. At this point i'm planning on a simple REST server in go/python to just spit out a value that I require on a call to it. It would be more simple than spinning up redis/mongo for a simple queue solution.

@cyberw cyberw added hacktoberfest See https://hacktoberfest.digitalocean.com for more info non-critical and removed hacktoberfest See https://hacktoberfest.digitalocean.com for more info labels Sep 28, 2020
@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Apr 11, 2021
@github-actions
Copy link

This issue was closed because it has been stalled for 10 days with no activity.

@pm-komal-jain
Copy link

Throwing in my thumbs up for this as well. Even something along the lines of an incrementing worker ID value would be helpful. At this point i'm planning on a simple REST server in go/python to just spit out a value that I require on a call to it. It would be more simple than spinning up redis/mongo for a simple queue solution.

How is this implemented? Curious to find a little more. We are facing same issue, to ensure, unique data across slaves. And want to implement a quick and light weight solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request non-critical stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it
Projects
None yet
Development

No branches or pull requests

6 participants