elastic · danielmitterdorfer · May 12, 2020 · May 12, 2020
diff --git a/docs/metrics.rst b/docs/metrics.rst
@@ -124,8 +124,10 @@ Rally stores the following metrics:
 * ``disk_io_write_bytes``: number of bytes that have been written to disk during the benchmark. On Linux this metric reports only the bytes that have been written by Elasticsearch, on Mac OS X it reports the number of bytes written by all processes.
 * ``disk_io_read_bytes``: number of bytes that have been read from disk during the benchmark. The same caveats apply on Mac OS X as for ``disk_io_write_bytes``.
 * ``node_startup_time``: The time in seconds it took from process start until the node is up.
-* ``node_total_old_gen_gc_time``: The total runtime of the old generation garbage collector across the whole cluster as reported by the node stats API.
 * ``node_total_young_gen_gc_time``: The total runtime of the young generation garbage collector across the whole cluster as reported by the node stats API.
+* ``node_total_young_gen_gc_count``: The total number of young generation garbage collections across the whole cluster as reported by the node stats API.
+* ``node_total_old_gen_gc_time``: The total runtime of the old generation garbage collector across the whole cluster as reported by the node stats API.
+* ``node_total_old_gen_gc_count``: The total number of old generation garbage collections across the whole cluster as reported by the node stats API.
 * ``segments_count``: Total number of segments as reported by the indices stats API.
 * ``segments_memory_in_bytes``: Number of bytes used for segments as reported by the indices stats API.
 * ``segments_doc_values_memory_in_bytes``: Number of bytes used for doc values as reported by the indices stats API.

diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -68,8 +68,10 @@ This will download Elasticsearch 6.5.3 and run Rally's default :doc:`track </glo
     |   All |              Median cumulative flush time across primary shards |                        |  0.118767 |     min |
     |   All |                 Max cumulative flush time across primary shards |                        |   0.14145 |     min |
     |   All |                                                Median CPU usage |                        |     284.4 |       % |
-    |   All |                                              Total Young Gen GC |                        |    12.868 |       s |
-    |   All |                                                Total Old Gen GC |                        |     3.803 |       s |
+    |   All |                                         Total Young Gen GC time |                        |    12.868 |       s |
+    |   All |                                        Total Young Gen GC count |                        |        17 |         |
+    |   All |                                           Total Old Gen GC time |                        |     3.803 |       s |
+    |   All |                                          Total Old Gen GC count |                        |         2 |         |
     |   All |                                                      Store size |                        |   3.17241 |      GB |
     |   All |                                                   Translog size |                        |   2.62736 |      GB |
     |   All |                                                      Index size |                        |   5.79977 |      GB |

diff --git a/docs/race.rst b/docs/race.rst
@@ -69,8 +69,10 @@ When the race has finished, Rally will show a summary on the command line::
     |              Total refresh time |              |   4.49527 |    min |
     |       Total merge throttle time |              |  0.120433 |    min |
     |                Median CPU usage |              |     546.5 |      % |
-    |              Total Young Gen GC |              |    72.078 |      s |
-    |                Total Old Gen GC |              |     3.426 |      s |
+    |         Total Young Gen GC time |              |    72.078 |      s |
+    |        Total Young Gen GC count |              |        43 |        |
+    |           Total Old Gen GC time |              |     3.426 |      s |
+    |          Total Old Gen GC count |              |         1 |        |
     |                      Index size |              |   2.26661 |     GB |
     |                   Total written |              |    30.083 |     GB |
     |          Heap used for segments |              |   10.7148 |     MB |

diff --git a/docs/summary_report.rst b/docs/summary_report.rst
@@ -101,19 +101,32 @@ ML processing time
 * **Corresponding metrics key**: ``ml_processing_time``
 
 
-Total Young Gen GC
-------------------
+Total Young Gen GC time
+-----------------------
 
 * **Definition**: The total runtime of the young generation garbage collector across the whole cluster as reported by the node stats API.
 * **Corresponding metrics key**: ``node_total_young_gen_gc_time``
 
 
-Total Old Gen GC
-----------------
+Total Young Gen GC count
+------------------------
+
+* **Definition**: The total number of young generation garbage collections across the whole cluster as reported by the node stats API.
+* **Corresponding metrics key**: ``node_total_young_gen_gc_count``
+
+
+Total Old Gen GC time
+---------------------
 
 * **Definition**: The total runtime of the old generation garbage collector across the whole cluster as reported by the node stats API.
 * **Corresponding metrics key**: ``node_total_old_gen_gc_time``
 
+Total Old Gen GC count
+----------------------
+
+* **Definition**: The total number of old generation garbage collections across the whole cluster as reported by the node stats API.
+* **Corresponding metrics key**: ``node_total_old_gen_gc_count``
+
 Store size
 ----------
 

diff --git a/docs/tournament.rst b/docs/tournament.rst
@@ -93,8 +93,10 @@ We can see that the user tag helps us to recognize races. We want to compare the
                                 Median CPU usage (index) [%]     668.025       679.15   +11.12499
                                 Median CPU usage (stats) [%]      143.75        162.4   +18.64999
                                Median CPU usage (search) [%]       223.1        229.2    +6.10000
-                                      Total Young Gen GC [s]      39.447       40.456    +1.00900
-                                        Total Old Gen GC [s]       7.108        7.703    +0.59500
+                                 Total Young Gen GC time [s]      39.447       40.456    +1.00900
+                                    Total Young Gen GC count          10           11    +1.00000
+                                   Total Old Gen GC time [s]       7.108        7.703    +0.59500
+                                      Total Old Gen GC count          10           11    +1.00000
                                              Index size [GB]     3.25475      3.25098    -0.00377
                                           Total written [GB]     17.8434      18.3143    +0.47083
                                  Heap used for segments [MB]     21.7504      21.5901    -0.16037

diff --git a/esrally/metrics.py b/esrally/metrics.py
@@ -1654,7 +1654,9 @@ def __call__(self):
 
         self.logger.debug("Gathering garbage collection metrics.")
         result.young_gc_time = self.sum("node_total_young_gen_gc_time")
+        result.young_gc_count = self.sum("node_total_young_gen_gc_count")
         result.old_gc_time = self.sum("node_total_old_gen_gc_time")
+        result.old_gc_count = self.sum("node_total_old_gen_gc_count")
 
         self.logger.debug("Gathering segment memory metrics.")
         result.memory_segments = self.median("segments_memory_in_bytes")
@@ -1789,7 +1791,9 @@ def __init__(self, d=None):
         self.ml_processing_time = self.v(d, "ml_processing_time", default=[])
 
         self.young_gc_time = self.v(d, "young_gc_time")
+        self.young_gc_count = self.v(d, "young_gc_count")
         self.old_gc_time = self.v(d, "old_gc_time")
+        self.old_gc_count = self.v(d, "old_gc_count")
 
         self.memory_segments = self.v(d, "memory_segments")
         self.memory_doc_values = self.v(d, "memory_doc_values")

diff --git a/esrally/reporter.py b/esrally/reporter.py
@@ -113,7 +113,7 @@ def report(self):
         metrics_table.extend(self.report_totals(stats))
         metrics_table.extend(self.report_ml_processing_times(stats))
 
-        metrics_table.extend(self.report_gc_times(stats))
+        metrics_table.extend(self.report_gc_metrics(stats))
 
         metrics_table.extend(self.report_disk_usage(stats))
         metrics_table.extend(self.report_segment_memory(stats))
@@ -237,10 +237,12 @@ def report_ml_processing_times(self, stats):
             lines.append(self.line("Max ML processing time", job_name, processing_time["max"], unit))
         return lines
 
-    def report_gc_times(self, stats):
+    def report_gc_metrics(self, stats):
         return self.join(
-            self.line("Total Young Gen GC", "", stats.young_gc_time, "s", convert.ms_to_seconds),
-            self.line("Total Old Gen GC", "", stats.old_gc_time, "s", convert.ms_to_seconds)
+            self.line("Total Young Gen GC time", "", stats.young_gc_time, "s", convert.ms_to_seconds),
+            self.line("Total Young Gen GC count", "", stats.young_gc_count, ""),
+            self.line("Total Old Gen GC time", "", stats.old_gc_time, "s", convert.ms_to_seconds),
+            self.line("Total Old Gen GC count", "", stats.old_gc_count, "")
         )
 
     def report_disk_usage(self, stats):
@@ -327,7 +329,7 @@ def metrics_table(self, baseline_stats, contender_stats, plain):
         metrics_table = []
         metrics_table.extend(self.report_total_times(baseline_stats, contender_stats))
         metrics_table.extend(self.report_ml_processing_times(baseline_stats, contender_stats))
-        metrics_table.extend(self.report_gc_times(baseline_stats, contender_stats))
+        metrics_table.extend(self.report_gc_metrics(baseline_stats, contender_stats))
         metrics_table.extend(self.report_disk_usage(baseline_stats, contender_stats))
         metrics_table.extend(self.report_segment_memory(baseline_stats, contender_stats))
         metrics_table.extend(self.report_segment_counts(baseline_stats, contender_stats))
@@ -499,12 +501,16 @@ def report_total_count(self, name, baseline_total, contender_total):
                       treat_increase_as_improvement=False)
         )
 
-    def report_gc_times(self, baseline_stats, contender_stats):
+    def report_gc_metrics(self, baseline_stats, contender_stats):
         return self.join(
-            self.line("Total Young Gen GC", baseline_stats.young_gc_time, contender_stats.young_gc_time, "", "s",
+            self.line("Total Young Gen GC time", baseline_stats.young_gc_time, contender_stats.young_gc_time, "", "s",
                       treat_increase_as_improvement=False, formatter=convert.ms_to_seconds),
-            self.line("Total Old Gen GC", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s",
-                      treat_increase_as_improvement=False, formatter=convert.ms_to_seconds)
+            self.line("Total Young Gen GC count", baseline_stats.young_gc_count, contender_stats.young_gc_count, "", "",
+                      treat_increase_as_improvement=False),
+            self.line("Total Old Gen GC time", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s",
+                      treat_increase_as_improvement=False, formatter=convert.ms_to_seconds),
+            self.line("Total Old Gen GC count", baseline_stats.old_gc_count, contender_stats.old_gc_count, "", "",
+                      treat_increase_as_improvement=False)
         )
 
     def report_disk_usage(self, baseline_stats, contender_stats):

diff --git a/esrally/telemetry.py b/esrally/telemetry.py
@@ -979,19 +979,27 @@ def on_benchmark_start(self):
     def on_benchmark_stop(self):
         jvm_stats_at_end = self.jvm_stats()
         total_old_gen_collection_time = 0
+        total_old_gen_collection_count = 0
         total_young_gen_collection_time = 0
+        total_young_gen_collection_count = 0
 
         for node_name, jvm_stats_end in jvm_stats_at_end.items():
             if node_name in self.jvm_stats_per_node:
                 jvm_stats_start = self.jvm_stats_per_node[node_name]
-                young_gc_time = max(jvm_stats_end["young_gc"] - jvm_stats_start["young_gc"], 0)
-                old_gc_time = max(jvm_stats_end["old_gc"] - jvm_stats_start["old_gc"], 0)
+                young_gc_time = max(jvm_stats_end["young_gc_time"] - jvm_stats_start["young_gc_time"], 0)
+                young_gc_count = max(jvm_stats_end["young_gc_count"] - jvm_stats_start["young_gc_count"], 0)
+                old_gc_time = max(jvm_stats_end["old_gc_time"] - jvm_stats_start["old_gc_time"], 0)
+                old_gc_count = max(jvm_stats_end["old_gc_count"] - jvm_stats_start["old_gc_count"], 0)
 
                 total_young_gen_collection_time += young_gc_time
+                total_young_gen_collection_count += young_gc_count
                 total_old_gen_collection_time += old_gc_time
+                total_old_gen_collection_count += old_gc_count
 
                 self.metrics_store.put_value_node_level(node_name, "node_young_gen_gc_time", young_gc_time, "ms")
+                self.metrics_store.put_count_node_level(node_name, "node_young_gen_gc_count", young_gc_count)
                 self.metrics_store.put_value_node_level(node_name, "node_old_gen_gc_time", old_gc_time, "ms")
+                self.metrics_store.put_count_node_level(node_name, "node_old_gen_gc_count", old_gc_count)
 
                 all_pool_stats = {
                     "name": "jvm_memory_pool_stats"
@@ -1007,7 +1015,9 @@ def on_benchmark_stop(self):
                 self.logger.warning("Cannot determine JVM stats for [%s] (not in the cluster at the start of the benchmark).", node_name)
 
         self.metrics_store.put_value_cluster_level("node_total_young_gen_gc_time", total_young_gen_collection_time, "ms")
+        self.metrics_store.put_count_cluster_level("node_total_young_gen_gc_count", total_young_gen_collection_count)
         self.metrics_store.put_value_cluster_level("node_total_old_gen_gc_time", total_old_gen_collection_time, "ms")
+        self.metrics_store.put_count_cluster_level("node_total_old_gen_gc_count", total_old_gen_collection_count)
 
         self.jvm_stats_per_node = None
 
@@ -1025,10 +1035,14 @@ def jvm_stats(self):
             node_name = node["name"]
             gc = node["jvm"]["gc"]["collectors"]
             old_gen_collection_time = gc["old"]["collection_time_in_millis"]
+            old_gen_collection_count = gc["old"]["collection_count"]
             young_gen_collection_time = gc["young"]["collection_time_in_millis"]
+            young_gen_collection_count = gc["young"]["collection_count"]
             jvm_stats[node_name] = {
-                "young_gc": young_gen_collection_time,
-                "old_gc": old_gen_collection_time,
+                "young_gc_time": young_gen_collection_time,
+                "young_gc_count": young_gen_collection_count,
+                "old_gc_time": old_gen_collection_time,
+                "old_gc_count": old_gen_collection_count,
                 "pools": {}
             }
             pool_usage = node["jvm"]["mem"]["pools"]

diff --git a/tests/driver/async_driver_test.py b/tests/driver/async_driver_test.py
@@ -207,8 +207,10 @@ def test_run_benchmark(self):
         self.assertIsNotNone(metric_store.get_one(name="throughput", task="bulk-index", sample_type=metrics.SampleType.Normal))
         self.assertIsNotNone(metric_store.get_one(name="node_total_young_gen_gc_time", sample_type=metrics.SampleType.Normal))
         self.assertIsNotNone(metric_store.get_one(name="node_total_old_gen_gc_time", sample_type=metrics.SampleType.Normal))
+        self.assertIsNotNone(metric_store.get_one(name="node_total_young_gen_gc_count", sample_type=metrics.SampleType.Normal))
+        self.assertIsNotNone(metric_store.get_one(name="node_total_old_gen_gc_count", sample_type=metrics.SampleType.Normal))
         # ensure that there are not more documents than we expect
-        self.assertEqual(6, len(metric_store.docs), msg=json.dumps(metric_store.docs, indent=2))
+        self.assertEqual(8, len(metric_store.docs), msg=json.dumps(metric_store.docs, indent=2))
 
     def tearDown(self):
         StaticClientFactory.close()
diff --git a/tests/metrics_test.py b/tests/metrics_test.py
@@ -1678,7 +1678,9 @@ def test_as_flat_list(self):
                 },
             ],
             "young_gc_time": 68,
+            "young_gc_count": 7,
             "old_gc_time": 0,
+            "old_gc_count": 0,
             "merge_time": 3702,
             "merge_time_per_shard": {
                 "min": 40,
@@ -1828,13 +1830,25 @@ def test_as_flat_list(self):
                 "single": 68
             }
         }, select(metric_list, "young_gc_time"))
+        self.assertEqual({
+            "name": "young_gc_count",
+            "value": {
+                "single": 7
+            }
+        }, select(metric_list, "young_gc_count"))
 
         self.assertEqual({
             "name": "old_gc_time",
             "value": {
                 "single": 0
             }
         }, select(metric_list, "old_gc_time"))
+        self.assertEqual({
+            "name": "old_gc_count",
+            "value": {
+                "single": 0
+            }
+        }, select(metric_list, "old_gc_count"))
 
         self.assertEqual({
             "name": "merge_time",

diff --git a/tests/telemetry_test.py b/tests/telemetry_test.py
@@ -2118,7 +2118,11 @@ class JvmStatsSummaryTests(TestCase):
     @mock.patch("esrally.metrics.EsMetricsStore.put_doc")
     @mock.patch("esrally.metrics.EsMetricsStore.put_value_cluster_level")
     @mock.patch("esrally.metrics.EsMetricsStore.put_value_node_level")
+    @mock.patch("esrally.metrics.EsMetricsStore.put_count_cluster_level")
+    @mock.patch("esrally.metrics.EsMetricsStore.put_count_node_level")
     def test_stores_only_diff_of_gc_times(self,
+                                          metrics_store_count_node_level,
+                                          metrics_store_count_cluster_level,
                                           metrics_store_node_level,
                                           metrics_store_cluster_level,
                                           metrics_store_put_doc):
@@ -2144,10 +2148,12 @@ def test_stores_only_diff_of_gc_times(self,
                         "gc": {
                             "collectors": {
                                 "old": {
-                                    "collection_time_in_millis": 1000
+                                    "collection_time_in_millis": 1000,
+                                    "collection_count": 1
                                 },
                                 "young": {
-                                    "collection_time_in_millis": 500
+                                    "collection_time_in_millis": 500,
+                                    "collection_count": 20
                                 }
                             }
                         }
@@ -2186,10 +2192,12 @@ def test_stores_only_diff_of_gc_times(self,
                         "gc": {
                             "collectors": {
                                 "old": {
-                                    "collection_time_in_millis": 2500
+                                    "collection_time_in_millis": 2500,
+                                    "collection_count": 2
                                 },
                                 "young": {
-                                    "collection_time_in_millis": 1200
+                                    "collection_time_in_millis": 1200,
+                                    "collection_count": 4000
                                 }
                             }
                         }
@@ -2205,10 +2213,20 @@ def test_stores_only_diff_of_gc_times(self,
             mock.call("rally0", "node_old_gen_gc_time", 1500, "ms"),
         ])
 
+        metrics_store_count_node_level.assert_has_calls([
+            mock.call("rally0", "node_young_gen_gc_count", 3980),
+            mock.call("rally0", "node_old_gen_gc_count", 1),
+        ])
+
         metrics_store_cluster_level.assert_has_calls([
             mock.call("node_total_young_gen_gc_time", 700, "ms"),
             mock.call("node_total_old_gen_gc_time", 1500, "ms")
         ])
+        metrics_store_count_cluster_level.assert_has_calls([
+            mock.call("node_total_young_gen_gc_count", 3980),
+            mock.call("node_total_old_gen_gc_count", 1),
+        ])
+
         metrics_store_put_doc.assert_has_calls([
             mock.call({
                 "name": "jvm_memory_pool_stats",