Refactored stats code and display median as well as 95% percentile response times in web UI's charts #549

heyman · 2017-02-22T21:25:16Z

I've refactored the stats code to track the total request stats in it’s own separate StatsEntry instance under RequestStats.total instead of dynamically calculating the total stats from all other stats entries.

I believe this makes the code slightly more clear. The total stats is now accessible through RequestStats.total and there is no longer a need for RequestStats.aggregated_stats(). StatsEntry.extend() is now only used for merging stats from slave reports, and therefore the full_request_history argument is not longer needed.

The background for this is that I'm looking to display the 95% percentile response time in the response time graph (as mentioned in #539 (comment)). When I started to look into the best way of implementing this I realized that the way we currently store stats data, doesn't allow us to retrieve percentile response times for the last couple of seconds, only for the whole test run (since we use the StatsEntry.response_times dict to calculate it). To allow this, we need to store a snapshot of this dict every second for a number of seconds back. We probably don't want to do this for every single stats entry, but it should be okay to do it for the total stats entry. Now that the total stats is tracked in it's own StatsEntry, and isn't calculated dynamically on the fly, it'll be easier to do that.

…parate StatsEntry instead of dynamically calculating the total stats from all other stats entries

… it’s no longer needed when we have a separate StatsEntry instance to track the total stats

…cond, for the last 20 seconds. This allows us to calculate the current response times for certain percentiles. We turn on this cache for the total StatsEntry.

…ime, as well as the 95% percentile

…the response times for only the last 10 seconds, to calculate current response time percentiles (before, we used the *total* response times like it was 10 seconds ago, which obviously was wrong)

…of two response times dicts

…lizing/resetting a StatsEntry with use_response_times_cache set to True

…witching

…on the master node when running distributed. This makes the current response time percentiles work when running Locust distributed.

…rectly in MockedRpsServer.mocked_send().

heyman · 2017-09-16T16:08:52Z

I think this is pretty much ready to be merged. Would love if someone could take a look and review it.

These are the main changes:

RequestStats now holds a total attribute of the type StatsEntry that represents the "total" row in the statistics. Before the total was calculated on the fly by aggregating the StatsEntry instances for each entry (row) in the statistics. This allowed me to simplify the StatsEntry.extend() by removing the full_request_history argument.

StatsEntry has gotten a cache that holds the response_times dict for the last 20 seconds. The cache is turned off by default, but it's turned on for RequestStats.total. This cache is used to calculate the current response time percentiles. The 50 and 95 percentiles are now graphed in the web UI.

aldenpeterson-wf · 2017-09-18T17:51:21Z

locust/stats.py

+    processed_count = 0
+    for response_time in sorted(six.iterkeys(response_times), reverse=True):
+        processed_count += response_times[response_time]
+        if((num_requests - processed_count) <= num_of_request):


nit: don't need extra () around this

aldenpeterson-wf · 2017-09-18T17:56:28Z

locust/stats.py

+    def get_current_response_time_percentile(self, percent):
+        """
+        Calculate the *current* response time for a certain percentile. We use a sliding 
+        window of (approximately) the last 10 seconds when calculating this.


would it be useful to make this 10 seconds customizable?

could make it a module variable here, I guess, so people can adjust it directly. I'm not sure it'd be used enough to make it a parameter

Yeah, a module variable should be good. Right now we only expose this to the users as the "current" response time percentile, which lets us get away with it being approximately the last 10 seconds. We also only use it in the very ephemeral web UI charts. If we were to expose it as an option, maybe people would expect it to be more exact.

Will fix!

aldenpeterson-wf · 2017-09-18T17:58:40Z

locust/stats.py

+            response_times=copy(self.response_times),
+            num_requests=self.num_requests,
+        )
+        if len(self.response_times_cache) > 20:


this seems rather arbitrary, 200 seconds seems like a lot to cache (though probably still negligible)

Since the cache might not hold response times for every second. We scan through the cache at time()-CURRENT_RESPONSE_TIME_PERCENTILE_WINDOW +/- 10 seconds to find a cache entry which we'll use to calculate the delta in order to get the "current" response time percentile.

I've changed the cache size to be CURRENT_RESPONSE_TIME_PERCENTILE_WINDOW + 10.

I guess we could make the number of seconds we scan configurable as well, but I really don't see a use-case where one would need to change this.

aldenpeterson-wf · 2017-09-18T18:05:13Z

locust/stats.py

+    use_response_times_cache = False
+    """
+    If set to True, the copy of the response_time dict will be stored in response_times_cache 
+    every second, and kept for 20 seconds. We can use this dict to calculate the *current* 


similar comment to my question about the 10 seconds sliding window time, I can see use cases for adjusting this (for example if your operations take a long time you may end up with wonky results here if the cache is only kept/reported for 20 seconds)

I don't think it'll be more wonky than the other stats. Since the stats gets reported once a request is complete, if you have endpoints that takes really long (e.g. 60 seconds), the stats will feel kind of delayed.

aldenpeterson-wf · 2017-09-18T18:06:19Z

locust/stats.py

    if global_stats.max_requests is not None and (global_stats.num_requests + global_stats.num_failures) >= global_stats.max_requests:
        raise StopLocust("Maximum number of requests reached")

 def on_request_failure(request_type, name, response_time, exception):
-    global_stats.get(name, request_type).log_error(exception)
+    #global_stats.get(name, request_type).log_error(exception)


should this be deleted instead of commented?

aldenpeterson-wf · 2017-09-18T18:06:28Z

locust/stats.py

@@ -439,18 +539,21 @@ def median_from_dict(total, count):
 """

 def on_request_success(request_type, name, response_time, response_length):
-    global_stats.get(name, request_type).log(response_time, response_length)
+    #global_stats.get(name, request_type).log(response_time, response_length)


should this be deleted instead of commented?

Ah, yes :)!

…ile configurable (through CURRENT_RESPONSE_TIME_PERCENTILE_WINDOW). Changed so that the size of the response times cache is CURRENT_RESPONSE_TIME_PERCENTILE_WINDOW + 10.

aldenpeterson-wf · 2017-09-19T14:04:23Z

Looks good to me!

heyman · 2017-09-19T14:13:10Z

Okay, great, thanks!

heyman added 6 commits February 22, 2017 21:59

Refactored stats code to track the total request stats in it’s own se…

c1c07e3

…parate StatsEntry instead of dynamically calculating the total stats from all other stats entries

Removed full_request_history argument from StatsEntry.extend(), since…

3f1e623

… it’s no longer needed when we have a separate StatsEntry instance to track the total stats

Merge branch 'master' into total-stats-entry

9d4d99f

Added missing import

710f193

Added support for caching the StatsEntry.response_times dict every se…

6c57151

…cond, for the last 20 seconds. This allows us to calculate the current response times for certain percentiles. We turn on this cache for the total StatsEntry.

Changed the Response time chart to show the current median response t…

c4f4233

…ime, as well as the 95% percentile

heyman changed the title ~~Refactored stats code~~ Refactored stats code and display median as well as 95% percentile response times in web UI's charts Mar 13, 2017

heyman added 8 commits September 15, 2017 15:46

Merge branch 'master' into total-stats-entry

337a9ce

Use the cached response times to calculate a *diff* which represents …

06d822b

…the response times for only the last 10 seconds, to calculate current response time percentiles (before, we used the *total* response times like it was 10 seconds ago, which obviously was wrong)

Merge branch 'master' into total-stats-entry

c4be073

Don’t include response times with zero requests when creating a diff …

fd8afe1

…of two response times dicts

Add an empty entry to the StatsEntry.response_times_cache when initia…

db64571

…lizing/resetting a StatsEntry with use_response_times_cache set to True

Added sleep(0) to the mocked_send method in order to force greenlet s…

bbdbbec

…witching

Store cache entries of the RequestStats.total’s response_times dict, …

82dc8b8

…on the master node when running distributed. This makes the current response time percentiles work when running Locust distributed.

Removed a bunch of sleep(0) that’s no longer needed since we do it di…

d905294

…rectly in MockedRpsServer.mocked_send().

heyman mentioned this pull request Sep 18, 2017

Release new Locust version #657

Closed

aldenpeterson-wf reviewed Sep 18, 2017

View reviewed changes

heyman added 4 commits September 19, 2017 15:12

Merge branch 'master' into total-stats-entry

4733dc4

Removed old, commented out code

1e84d2f

Removed unnecessary parentheses

6dac49f

Made the windows size/resolution of the current response time percent…

9cc1456

…ile configurable (through CURRENT_RESPONSE_TIME_PERCENTILE_WINDOW). Changed so that the size of the response times cache is CURRENT_RESPONSE_TIME_PERCENTILE_WINDOW + 10.

heyman merged commit eab8429 into master Sep 19, 2017

mbeacom deleted the total-stats-entry branch October 18, 2019 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored stats code and display median as well as 95% percentile response times in web UI's charts #549

Refactored stats code and display median as well as 95% percentile response times in web UI's charts #549

heyman commented Feb 22, 2017

heyman commented Sep 16, 2017

aldenpeterson-wf Sep 18, 2017

heyman Sep 19, 2017 •

edited

Loading

aldenpeterson-wf Sep 18, 2017

heyman Sep 19, 2017

aldenpeterson-wf Sep 18, 2017

heyman Sep 19, 2017

aldenpeterson-wf Sep 18, 2017

heyman Sep 19, 2017

aldenpeterson-wf Sep 18, 2017

aldenpeterson-wf Sep 18, 2017

heyman Sep 19, 2017

aldenpeterson-wf commented Sep 19, 2017

heyman commented Sep 19, 2017

Refactored stats code and display median as well as 95% percentile response times in web UI's charts #549

Refactored stats code and display median as well as 95% percentile response times in web UI's charts #549

Conversation

heyman commented Feb 22, 2017

heyman commented Sep 16, 2017

Choose a reason for hiding this comment

heyman Sep 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aldenpeterson-wf commented Sep 19, 2017

heyman commented Sep 19, 2017

heyman Sep 19, 2017 •

edited

Loading