-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored stats code and display median as well as 95% percentile response times in web UI's charts #549
Conversation
…parate StatsEntry instead of dynamically calculating the total stats from all other stats entries
… it’s no longer needed when we have a separate StatsEntry instance to track the total stats
…cond, for the last 20 seconds. This allows us to calculate the current response times for certain percentiles. We turn on this cache for the total StatsEntry.
…ime, as well as the 95% percentile
…the response times for only the last 10 seconds, to calculate current response time percentiles (before, we used the *total* response times like it was 10 seconds ago, which obviously was wrong)
…of two response times dicts
…lizing/resetting a StatsEntry with use_response_times_cache set to True
…on the master node when running distributed. This makes the current response time percentiles work when running Locust distributed.
…rectly in MockedRpsServer.mocked_send().
I think this is pretty much ready to be merged. Would love if someone could take a look and review it. These are the main changes:
|
locust/stats.py
Outdated
processed_count = 0 | ||
for response_time in sorted(six.iterkeys(response_times), reverse=True): | ||
processed_count += response_times[response_time] | ||
if((num_requests - processed_count) <= num_of_request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't need extra () around this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
locust/stats.py
Outdated
def get_current_response_time_percentile(self, percent): | ||
""" | ||
Calculate the *current* response time for a certain percentile. We use a sliding | ||
window of (approximately) the last 10 seconds when calculating this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be useful to make this 10 seconds customizable?
could make it a module variable here, I guess, so people can adjust it directly. I'm not sure it'd be used enough to make it a parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a module variable should be good. Right now we only expose this to the users as the "current" response time percentile, which lets us get away with it being approximately the last 10 seconds. We also only use it in the very ephemeral web UI charts. If we were to expose it as an option, maybe people would expect it to be more exact.
Will fix!
locust/stats.py
Outdated
response_times=copy(self.response_times), | ||
num_requests=self.num_requests, | ||
) | ||
if len(self.response_times_cache) > 20: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems rather arbitrary, 200 seconds seems like a lot to cache (though probably still negligible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the cache might not hold response times for every second. We scan through the cache at time()-CURRENT_RESPONSE_TIME_PERCENTILE_WINDOW +/- 10 seconds to find a cache entry which we'll use to calculate the delta in order to get the "current" response time percentile.
I've changed the cache size to be CURRENT_RESPONSE_TIME_PERCENTILE_WINDOW + 10.
I guess we could make the number of seconds we scan configurable as well, but I really don't see a use-case where one would need to change this.
locust/stats.py
Outdated
use_response_times_cache = False | ||
""" | ||
If set to True, the copy of the response_time dict will be stored in response_times_cache | ||
every second, and kept for 20 seconds. We can use this dict to calculate the *current* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment to my question about the 10 seconds sliding window time, I can see use cases for adjusting this (for example if your operations take a long time you may end up with wonky results here if the cache is only kept/reported for 20 seconds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it'll be more wonky than the other stats. Since the stats gets reported once a request is complete, if you have endpoints that takes really long (e.g. 60 seconds), the stats will feel kind of delayed.
locust/stats.py
Outdated
if global_stats.max_requests is not None and (global_stats.num_requests + global_stats.num_failures) >= global_stats.max_requests: | ||
raise StopLocust("Maximum number of requests reached") | ||
|
||
def on_request_failure(request_type, name, response_time, exception): | ||
global_stats.get(name, request_type).log_error(exception) | ||
#global_stats.get(name, request_type).log_error(exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be deleted instead of commented?
locust/stats.py
Outdated
@@ -439,18 +539,21 @@ def median_from_dict(total, count): | |||
""" | |||
|
|||
def on_request_success(request_type, name, response_time, response_length): | |||
global_stats.get(name, request_type).log(response_time, response_length) | |||
#global_stats.get(name, request_type).log(response_time, response_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be deleted instead of commented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes :)!
…ile configurable (through CURRENT_RESPONSE_TIME_PERCENTILE_WINDOW). Changed so that the size of the response times cache is CURRENT_RESPONSE_TIME_PERCENTILE_WINDOW + 10.
Looks good to me! |
Okay, great, thanks! |
I've refactored the stats code to track the total request stats in it’s own separate StatsEntry instance under RequestStats.total instead of dynamically calculating the total stats from all other stats entries.
I believe this makes the code slightly more clear. The total stats is now accessible through
RequestStats.total
and there is no longer a need forRequestStats.aggregated_stats()
.StatsEntry.extend()
is now only used for merging stats from slave reports, and therefore thefull_request_history
argument is not longer needed.The background for this is that I'm looking to display the 95% percentile response time in the response time graph (as mentioned in #539 (comment)). When I started to look into the best way of implementing this I realized that the way we currently store stats data, doesn't allow us to retrieve percentile response times for the last couple of seconds, only for the whole test run (since we use the
StatsEntry.response_times
dict to calculate it). To allow this, we need to store a snapshot of this dict every second for a number of seconds back. We probably don't want to do this for every single stats entry, but it should be okay to do it for the total stats entry. Now that the total stats is tracked in it's ownStatsEntry
, and isn't calculated dynamically on the fly, it'll be easier to do that.