-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distribution of user classes is not respected and some user classes are just never spawned #1618
Comments
Thanks for this report! I actually think this is an issue with normal user distribution as well (though it is exacerbated by Shapes, because we are spawning few new users at a time). Even in "normal" ramp up, if you have 10 workers and say you want 10 users, you will only ever get one User type (the heaviest one if I remember correctly) I dont think I'll have time to look at this myself any time soon, but my suggestion for solution is for every worker to "shift" which users we start with in The math in that function is (overly?) complicated, but I think it can be fixed. Perhaps it is easiest if we first fix this #1601, so there is a consistent ID for each worker (and not just a guid) |
I created a draft PR #1621 with some changes that I think would help alleviate the problem without having to change or re-factor too much stuff. |
Unfortunately that approach is not good enough. When using a large number of users it might be, but imagine running two types of users and launching two users and sometimes getting 2xUser1, sometimes 1xUser1 + 1xUser2 and sometimes 2xUser2. Whatever approach we choose must be deterministic, at least in most cases. |
For the determinism, I think that initializing the random seed to a constant is sufficient to give repeatable runs. I don't think that the users of locust expect absolute determinism regarding the spawned users. Even in the doc it says:
the keyword being "likely" implying a probability. However, we should guarantee that each user class will be spawned at least once if the number of users is greater or equal to the number of user classes. I think the solution is for the
I wrote some code to test this idea on my side and it seems to give good results. Of course, it is not completely deterministic (unless the seed is fixed), but at least it does not have the problems of the current implementation. The most difficult part will be to refactor the master and worker runners to have the master decide which users to spawn on each worker. |
A constant random seed could be ok, but you'd need to distribute it to all the worker nodes, determinism is super important. I didnt have time to think about your math there, but I guess it might be good :) Not entirely sold on a brute force algoritm either. Also, it feels a bit heavy handed to just rewrite everything, I think there has been a fair amount of thought put into the current implementation (I didnt write it though) |
The brute-force algorithm might not be the smartest approach, although according to my preliminary tests, it converges to a solution quickly. The alternative is to add a dependency such as SciKit and use a real optimization algorithm and I don't know if we want that. Right now, the idea would be to have the master compute the optimal users distribution at each iteration, then dispatch the users to each worker. The dispatch could take into account the users already running on each worker so that the disruption on each one is minimized (i.e. prevent excessive stop of users). The dispatch will also try to balance the users across the workers as well as ensure the spawn rate (this will fix #896). I think it's clear the current implementation is somewhat broken and needs to change. Especially regarding handling the new load test shape feature. I tested some stuff if you want to run it on your machine: https://gist.github.com/mboutet/5047465a315868ac7e7290a354c24bb4. I'm willing to contribute and make all these changes. |
Ok, I'm slightly warming up to your idea :) But what if we could do something much simpler, like always choosing to start/stop the user type with the biggest diff between current and desired distribution (and picking the first one if there are multiple ones with the same diff), and then looping? Or is that not possible for some reason? |
Another desirable trait of the algorithm should be to spawn a similar distribution of users on all workers (so one worker doesnt get a radically different user distribution than another, if it can be avoided) |
Yes, I agree. And this is something I actually need in my own use cases where some user classes perform quite intensive computation to generate random texts whereas some others are low-cpu users. This will be implemented where the current dispatch logic is: Lines 546 to 559 in 30756ce
This is also where the logic for handling the distributed spawn rate will take place.
I'm not sure I understand your idea, but I think this is similar to what I'm proposing. In my small PoC, I always consider the current set of users in the calculation of the new set. So, I'm not computing a completely new set of users each time the Having the master runner control the distribution will also open the door to presenting stats on the users distribution. This could also be shown in a new tab of the web UI similar to the one that shows the workers. I'm aware that all of this is quite a big and complicated refactor, but I'll do my best to keep things simple and to not disrupt too much the existing codebase. I believe that if this can be successfully implemented, it will bring locust much more value. |
What I mean is instead of choosing a random user, and then checking to see what the size of diff we get, we try adding User1, calculate the diff, try adding User2, calculate the diff, etc and then just pick the one with the minimal diff. So no random selection of Users the way you are currently doing in I'm not exactly sure how to scale this for spawning lots of users (we probably dont want to do the above procedure once for each user to spawn), but there should be a way without resorting to random. |
@cyberw, I went with your approach, so everything is deterministic. I still have some work to do on my PR, but once it is ready for review, I will remove the "Draft" status. |
Cool! I left a few comments on the current version and will be back for more later, once you are finished! |
I don't know if it is important but today I have met the same issue but only for FastHttpUsers. HttpUser are working correctly. |
From my understanding of the codebase, |
I was a bit surprised about this but it is reproducible. If I use HttpUser then it is working correctly but with FastHttpUser only users with first user class are spawned. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
/remove-lifecycle stale |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
/remove-lifecycle stale |
Describe the bug
When using
LoadTestShape
in distributed mode and with multiple user classes having different weights, the distribution of users does not respect the weights (even within some tolerance). Furthermore, users with the smaller weights are often never picked up. The problem appears especially when theLoadTestShape
specifies stages with small increments (e.g. 5 users at a time).Expected behavior
The distribution of users should take into account the overall distribution of users across all workers.
Actual behavior
Some user classes with lower weights are never picked up and the distribution of users does not respect the weights (even within a certain tolerance).
Steps to reproduce
I think the problem is that each worker is responsible for spawning its own users. Consider the following setup:
[35, 55, 10]
Once the test starts, the master will instruct each worker to spawn 1 user every minute. However, the
weight_users
function will always return the user with the weight of 55.Possible solutions
I see two aspects that needs to be implemented:
I'm not super familiar with the codebase, but would that make sense? Is there some technical limitation I'm not aware of?
Environment
Master:
Workers:
The content of each test has been omitted. Also, this file is rendered from a template, so that is why the classes and tasks have generic names.
The text was updated successfully, but these errors were encountered: