Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report number of GCs #994

Merged
merged 1 commit into from
May 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,10 @@ Rally stores the following metrics:
* ``disk_io_write_bytes``: number of bytes that have been written to disk during the benchmark. On Linux this metric reports only the bytes that have been written by Elasticsearch, on Mac OS X it reports the number of bytes written by all processes.
* ``disk_io_read_bytes``: number of bytes that have been read from disk during the benchmark. The same caveats apply on Mac OS X as for ``disk_io_write_bytes``.
* ``node_startup_time``: The time in seconds it took from process start until the node is up.
* ``node_total_old_gen_gc_time``: The total runtime of the old generation garbage collector across the whole cluster as reported by the node stats API.
* ``node_total_young_gen_gc_time``: The total runtime of the young generation garbage collector across the whole cluster as reported by the node stats API.
* ``node_total_young_gen_gc_count``: The total number of young generation garbage collections across the whole cluster as reported by the node stats API.
* ``node_total_old_gen_gc_time``: The total runtime of the old generation garbage collector across the whole cluster as reported by the node stats API.
* ``node_total_old_gen_gc_count``: The total number of old generation garbage collections across the whole cluster as reported by the node stats API.
* ``segments_count``: Total number of segments as reported by the indices stats API.
* ``segments_memory_in_bytes``: Number of bytes used for segments as reported by the indices stats API.
* ``segments_doc_values_memory_in_bytes``: Number of bytes used for doc values as reported by the indices stats API.
Expand Down
6 changes: 4 additions & 2 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,10 @@ This will download Elasticsearch 6.5.3 and run Rally's default :doc:`track </glo
| All | Median cumulative flush time across primary shards | | 0.118767 | min |
| All | Max cumulative flush time across primary shards | | 0.14145 | min |
| All | Median CPU usage | | 284.4 | % |
| All | Total Young Gen GC | | 12.868 | s |
| All | Total Old Gen GC | | 3.803 | s |
| All | Total Young Gen GC time | | 12.868 | s |
| All | Total Young Gen GC count | | 17 | |
| All | Total Old Gen GC time | | 3.803 | s |
| All | Total Old Gen GC count | | 2 | |
| All | Store size | | 3.17241 | GB |
| All | Translog size | | 2.62736 | GB |
| All | Index size | | 5.79977 | GB |
Expand Down
6 changes: 4 additions & 2 deletions docs/race.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,10 @@ When the race has finished, Rally will show a summary on the command line::
| Total refresh time | | 4.49527 | min |
| Total merge throttle time | | 0.120433 | min |
| Median CPU usage | | 546.5 | % |
| Total Young Gen GC | | 72.078 | s |
| Total Old Gen GC | | 3.426 | s |
| Total Young Gen GC time | | 72.078 | s |
| Total Young Gen GC count | | 43 | |
| Total Old Gen GC time | | 3.426 | s |
| Total Old Gen GC count | | 1 | |
| Index size | | 2.26661 | GB |
| Total written | | 30.083 | GB |
| Heap used for segments | | 10.7148 | MB |
Expand Down
21 changes: 17 additions & 4 deletions docs/summary_report.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,19 +101,32 @@ ML processing time
* **Corresponding metrics key**: ``ml_processing_time``


Total Young Gen GC
------------------
Total Young Gen GC time
-----------------------

* **Definition**: The total runtime of the young generation garbage collector across the whole cluster as reported by the node stats API.
* **Corresponding metrics key**: ``node_total_young_gen_gc_time``


Total Old Gen GC
----------------
Total Young Gen GC count
------------------------

* **Definition**: The total number of young generation garbage collections across the whole cluster as reported by the node stats API.
* **Corresponding metrics key**: ``node_total_young_gen_gc_count``


Total Old Gen GC time
---------------------

* **Definition**: The total runtime of the old generation garbage collector across the whole cluster as reported by the node stats API.
* **Corresponding metrics key**: ``node_total_old_gen_gc_time``

Total Old Gen GC count
----------------------

* **Definition**: The total number of old generation garbage collections across the whole cluster as reported by the node stats API.
* **Corresponding metrics key**: ``node_total_old_gen_gc_count``

Store size
----------

Expand Down
6 changes: 4 additions & 2 deletions docs/tournament.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,10 @@ We can see that the user tag helps us to recognize races. We want to compare the
Median CPU usage (index) [%] 668.025 679.15 +11.12499
Median CPU usage (stats) [%] 143.75 162.4 +18.64999
Median CPU usage (search) [%] 223.1 229.2 +6.10000
Total Young Gen GC [s] 39.447 40.456 +1.00900
Total Old Gen GC [s] 7.108 7.703 +0.59500
Total Young Gen GC time [s] 39.447 40.456 +1.00900
Total Young Gen GC count 10 11 +1.00000
Total Old Gen GC time [s] 7.108 7.703 +0.59500
Total Old Gen GC count 10 11 +1.00000
Index size [GB] 3.25475 3.25098 -0.00377
Total written [GB] 17.8434 18.3143 +0.47083
Heap used for segments [MB] 21.7504 21.5901 -0.16037
Expand Down
4 changes: 4 additions & 0 deletions esrally/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -1654,7 +1654,9 @@ def __call__(self):

self.logger.debug("Gathering garbage collection metrics.")
result.young_gc_time = self.sum("node_total_young_gen_gc_time")
result.young_gc_count = self.sum("node_total_young_gen_gc_count")
result.old_gc_time = self.sum("node_total_old_gen_gc_time")
result.old_gc_count = self.sum("node_total_old_gen_gc_count")

self.logger.debug("Gathering segment memory metrics.")
result.memory_segments = self.median("segments_memory_in_bytes")
Expand Down Expand Up @@ -1789,7 +1791,9 @@ def __init__(self, d=None):
self.ml_processing_time = self.v(d, "ml_processing_time", default=[])

self.young_gc_time = self.v(d, "young_gc_time")
self.young_gc_count = self.v(d, "young_gc_count")
self.old_gc_time = self.v(d, "old_gc_time")
self.old_gc_count = self.v(d, "old_gc_count")

self.memory_segments = self.v(d, "memory_segments")
self.memory_doc_values = self.v(d, "memory_doc_values")
Expand Down
24 changes: 15 additions & 9 deletions esrally/reporter.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def report(self):
metrics_table.extend(self.report_totals(stats))
metrics_table.extend(self.report_ml_processing_times(stats))

metrics_table.extend(self.report_gc_times(stats))
metrics_table.extend(self.report_gc_metrics(stats))

metrics_table.extend(self.report_disk_usage(stats))
metrics_table.extend(self.report_segment_memory(stats))
Expand Down Expand Up @@ -237,10 +237,12 @@ def report_ml_processing_times(self, stats):
lines.append(self.line("Max ML processing time", job_name, processing_time["max"], unit))
return lines

def report_gc_times(self, stats):
def report_gc_metrics(self, stats):
return self.join(
self.line("Total Young Gen GC", "", stats.young_gc_time, "s", convert.ms_to_seconds),
self.line("Total Old Gen GC", "", stats.old_gc_time, "s", convert.ms_to_seconds)
self.line("Total Young Gen GC time", "", stats.young_gc_time, "s", convert.ms_to_seconds),
self.line("Total Young Gen GC count", "", stats.young_gc_count, ""),
self.line("Total Old Gen GC time", "", stats.old_gc_time, "s", convert.ms_to_seconds),
self.line("Total Old Gen GC count", "", stats.old_gc_count, "")
)

def report_disk_usage(self, stats):
Expand Down Expand Up @@ -327,7 +329,7 @@ def metrics_table(self, baseline_stats, contender_stats, plain):
metrics_table = []
metrics_table.extend(self.report_total_times(baseline_stats, contender_stats))
metrics_table.extend(self.report_ml_processing_times(baseline_stats, contender_stats))
metrics_table.extend(self.report_gc_times(baseline_stats, contender_stats))
metrics_table.extend(self.report_gc_metrics(baseline_stats, contender_stats))
metrics_table.extend(self.report_disk_usage(baseline_stats, contender_stats))
metrics_table.extend(self.report_segment_memory(baseline_stats, contender_stats))
metrics_table.extend(self.report_segment_counts(baseline_stats, contender_stats))
Expand Down Expand Up @@ -499,12 +501,16 @@ def report_total_count(self, name, baseline_total, contender_total):
treat_increase_as_improvement=False)
)

def report_gc_times(self, baseline_stats, contender_stats):
def report_gc_metrics(self, baseline_stats, contender_stats):
return self.join(
self.line("Total Young Gen GC", baseline_stats.young_gc_time, contender_stats.young_gc_time, "", "s",
self.line("Total Young Gen GC time", baseline_stats.young_gc_time, contender_stats.young_gc_time, "", "s",
treat_increase_as_improvement=False, formatter=convert.ms_to_seconds),
self.line("Total Old Gen GC", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s",
treat_increase_as_improvement=False, formatter=convert.ms_to_seconds)
self.line("Total Young Gen GC count", baseline_stats.young_gc_count, contender_stats.young_gc_count, "", "",
treat_increase_as_improvement=False),
self.line("Total Old Gen GC time", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s",
treat_increase_as_improvement=False, formatter=convert.ms_to_seconds),
self.line("Total Old Gen GC count", baseline_stats.old_gc_count, contender_stats.old_gc_count, "", "",
treat_increase_as_improvement=False)
)

def report_disk_usage(self, baseline_stats, contender_stats):
Expand Down
22 changes: 18 additions & 4 deletions esrally/telemetry.py
Original file line number Diff line number Diff line change
Expand Up @@ -979,19 +979,27 @@ def on_benchmark_start(self):
def on_benchmark_stop(self):
jvm_stats_at_end = self.jvm_stats()
total_old_gen_collection_time = 0
total_old_gen_collection_count = 0
total_young_gen_collection_time = 0
total_young_gen_collection_count = 0

for node_name, jvm_stats_end in jvm_stats_at_end.items():
if node_name in self.jvm_stats_per_node:
jvm_stats_start = self.jvm_stats_per_node[node_name]
young_gc_time = max(jvm_stats_end["young_gc"] - jvm_stats_start["young_gc"], 0)
old_gc_time = max(jvm_stats_end["old_gc"] - jvm_stats_start["old_gc"], 0)
young_gc_time = max(jvm_stats_end["young_gc_time"] - jvm_stats_start["young_gc_time"], 0)
young_gc_count = max(jvm_stats_end["young_gc_count"] - jvm_stats_start["young_gc_count"], 0)
old_gc_time = max(jvm_stats_end["old_gc_time"] - jvm_stats_start["old_gc_time"], 0)
old_gc_count = max(jvm_stats_end["old_gc_count"] - jvm_stats_start["old_gc_count"], 0)

total_young_gen_collection_time += young_gc_time
total_young_gen_collection_count += young_gc_count
total_old_gen_collection_time += old_gc_time
total_old_gen_collection_count += old_gc_count

self.metrics_store.put_value_node_level(node_name, "node_young_gen_gc_time", young_gc_time, "ms")
self.metrics_store.put_count_node_level(node_name, "node_young_gen_gc_count", young_gc_count)
self.metrics_store.put_value_node_level(node_name, "node_old_gen_gc_time", old_gc_time, "ms")
self.metrics_store.put_count_node_level(node_name, "node_old_gen_gc_count", old_gc_count)

all_pool_stats = {
"name": "jvm_memory_pool_stats"
Expand All @@ -1007,7 +1015,9 @@ def on_benchmark_stop(self):
self.logger.warning("Cannot determine JVM stats for [%s] (not in the cluster at the start of the benchmark).", node_name)

self.metrics_store.put_value_cluster_level("node_total_young_gen_gc_time", total_young_gen_collection_time, "ms")
self.metrics_store.put_count_cluster_level("node_total_young_gen_gc_count", total_young_gen_collection_count)
self.metrics_store.put_value_cluster_level("node_total_old_gen_gc_time", total_old_gen_collection_time, "ms")
self.metrics_store.put_count_cluster_level("node_total_old_gen_gc_count", total_old_gen_collection_count)

self.jvm_stats_per_node = None

Expand All @@ -1025,10 +1035,14 @@ def jvm_stats(self):
node_name = node["name"]
gc = node["jvm"]["gc"]["collectors"]
old_gen_collection_time = gc["old"]["collection_time_in_millis"]
old_gen_collection_count = gc["old"]["collection_count"]
young_gen_collection_time = gc["young"]["collection_time_in_millis"]
young_gen_collection_count = gc["young"]["collection_count"]
jvm_stats[node_name] = {
"young_gc": young_gen_collection_time,
"old_gc": old_gen_collection_time,
"young_gc_time": young_gen_collection_time,
"young_gc_count": young_gen_collection_count,
"old_gc_time": old_gen_collection_time,
"old_gc_count": old_gen_collection_count,
"pools": {}
}
pool_usage = node["jvm"]["mem"]["pools"]
Expand Down
4 changes: 3 additions & 1 deletion tests/driver/async_driver_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,10 @@ def test_run_benchmark(self):
self.assertIsNotNone(metric_store.get_one(name="throughput", task="bulk-index", sample_type=metrics.SampleType.Normal))
self.assertIsNotNone(metric_store.get_one(name="node_total_young_gen_gc_time", sample_type=metrics.SampleType.Normal))
self.assertIsNotNone(metric_store.get_one(name="node_total_old_gen_gc_time", sample_type=metrics.SampleType.Normal))
self.assertIsNotNone(metric_store.get_one(name="node_total_young_gen_gc_count", sample_type=metrics.SampleType.Normal))
self.assertIsNotNone(metric_store.get_one(name="node_total_old_gen_gc_count", sample_type=metrics.SampleType.Normal))
# ensure that there are not more documents than we expect
self.assertEqual(6, len(metric_store.docs), msg=json.dumps(metric_store.docs, indent=2))
self.assertEqual(8, len(metric_store.docs), msg=json.dumps(metric_store.docs, indent=2))

def tearDown(self):
StaticClientFactory.close()
14 changes: 14 additions & 0 deletions tests/metrics_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1678,7 +1678,9 @@ def test_as_flat_list(self):
},
],
"young_gc_time": 68,
"young_gc_count": 7,
"old_gc_time": 0,
"old_gc_count": 0,
"merge_time": 3702,
"merge_time_per_shard": {
"min": 40,
Expand Down Expand Up @@ -1828,13 +1830,25 @@ def test_as_flat_list(self):
"single": 68
}
}, select(metric_list, "young_gc_time"))
self.assertEqual({
"name": "young_gc_count",
"value": {
"single": 7
}
}, select(metric_list, "young_gc_count"))

self.assertEqual({
"name": "old_gc_time",
"value": {
"single": 0
}
}, select(metric_list, "old_gc_time"))
self.assertEqual({
"name": "old_gc_count",
"value": {
"single": 0
}
}, select(metric_list, "old_gc_count"))

self.assertEqual({
"name": "merge_time",
Expand Down
26 changes: 22 additions & 4 deletions tests/telemetry_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -2118,7 +2118,11 @@ class JvmStatsSummaryTests(TestCase):
@mock.patch("esrally.metrics.EsMetricsStore.put_doc")
@mock.patch("esrally.metrics.EsMetricsStore.put_value_cluster_level")
@mock.patch("esrally.metrics.EsMetricsStore.put_value_node_level")
@mock.patch("esrally.metrics.EsMetricsStore.put_count_cluster_level")
@mock.patch("esrally.metrics.EsMetricsStore.put_count_node_level")
def test_stores_only_diff_of_gc_times(self,
metrics_store_count_node_level,
metrics_store_count_cluster_level,
metrics_store_node_level,
metrics_store_cluster_level,
metrics_store_put_doc):
Expand All @@ -2144,10 +2148,12 @@ def test_stores_only_diff_of_gc_times(self,
"gc": {
"collectors": {
"old": {
"collection_time_in_millis": 1000
"collection_time_in_millis": 1000,
"collection_count": 1
},
"young": {
"collection_time_in_millis": 500
"collection_time_in_millis": 500,
"collection_count": 20
}
}
}
Expand Down Expand Up @@ -2186,10 +2192,12 @@ def test_stores_only_diff_of_gc_times(self,
"gc": {
"collectors": {
"old": {
"collection_time_in_millis": 2500
"collection_time_in_millis": 2500,
"collection_count": 2
},
"young": {
"collection_time_in_millis": 1200
"collection_time_in_millis": 1200,
"collection_count": 4000
}
}
}
Expand All @@ -2205,10 +2213,20 @@ def test_stores_only_diff_of_gc_times(self,
mock.call("rally0", "node_old_gen_gc_time", 1500, "ms"),
])

metrics_store_count_node_level.assert_has_calls([
mock.call("rally0", "node_young_gen_gc_count", 3980),
mock.call("rally0", "node_old_gen_gc_count", 1),
])

metrics_store_cluster_level.assert_has_calls([
mock.call("node_total_young_gen_gc_time", 700, "ms"),
mock.call("node_total_old_gen_gc_time", 1500, "ms")
])
metrics_store_count_cluster_level.assert_has_calls([
mock.call("node_total_young_gen_gc_count", 3980),
mock.call("node_total_old_gen_gc_count", 1),
])

metrics_store_put_doc.assert_has_calls([
mock.call({
"name": "jvm_memory_pool_stats",
Expand Down