Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Commit

Permalink
Merge pull request #2785 from matrix-org/rav/reorganise_metrics_again
Browse files Browse the repository at this point in the history
Reorganise request and block metrics
  • Loading branch information
richvdh authored Jan 16, 2018
2 parents f5f89fd + ee7a1ca commit febdca4
Show file tree
Hide file tree
Showing 5 changed files with 209 additions and 47 deletions.
5 changes: 4 additions & 1 deletion CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@ Unreleased
==========

synctl no longer starts the main synapse when using ``-a`` option with workers.
A new worker file should be added with ``worker_app: synapse.app.homeserver``
A new worker file should be added with ``worker_app: synapse.app.homeserver``.

This release also begins the process of renaming a number of the metrics
reported to prometheus. See `docs/metrics-howto.rst <docs/metrics-howto.rst#block-and-response-metrics-renamed-for-0-27-0>`_.


Changes in synapse v0.26.0 (2018-01-05)
Expand Down
61 changes: 54 additions & 7 deletions docs/metrics-howto.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ How to monitor Synapse metrics using Prometheus
metrics_port: 9092

Also ensure that ``enable_metrics`` is set to ``True``.

Restart synapse.

3. Add a prometheus target for synapse.
Expand All @@ -28,11 +28,58 @@ How to monitor Synapse metrics using Prometheus
static_configs:
- targets: ["my.server.here:9092"]

If your prometheus is older than 1.5.2, you will need to replace
If your prometheus is older than 1.5.2, you will need to replace
``static_configs`` in the above with ``target_groups``.

Restart prometheus.


Block and response metrics renamed for 0.27.0
---------------------------------------------

Synapse 0.27.0 begins the process of rationalising the duplicate ``*:count``
metrics reported for the resource tracking for code blocks and HTTP requests.

At the same time, the corresponding ``*:total`` metrics are being renamed, as
the ``:total`` suffix no longer makes sense in the absence of a corresponding
``:count`` metric.

To enable a graceful migration path, this release just adds new names for the
metrics being renamed. A future release will remove the old ones.

The following table shows the new metrics, and the old metrics which they are
replacing.

==================================================== ===================================================
New name Old name
==================================================== ===================================================
synapse_util_metrics_block_count synapse_util_metrics_block_timer:count
synapse_util_metrics_block_count synapse_util_metrics_block_ru_utime:count
synapse_util_metrics_block_count synapse_util_metrics_block_ru_stime:count
synapse_util_metrics_block_count synapse_util_metrics_block_db_txn_count:count
synapse_util_metrics_block_count synapse_util_metrics_block_db_txn_duration:count

synapse_util_metrics_block_time_seconds synapse_util_metrics_block_timer:total
synapse_util_metrics_block_ru_utime_seconds synapse_util_metrics_block_ru_utime:total
synapse_util_metrics_block_ru_stime_seconds synapse_util_metrics_block_ru_stime:total
synapse_util_metrics_block_db_txn_count synapse_util_metrics_block_db_txn_count:total
synapse_util_metrics_block_db_txn_duration_seconds synapse_util_metrics_block_db_txn_duration:total

synapse_http_server_response_count synapse_http_server_requests
synapse_http_server_response_count synapse_http_server_response_time:count
synapse_http_server_response_count synapse_http_server_response_ru_utime:count
synapse_http_server_response_count synapse_http_server_response_ru_stime:count
synapse_http_server_response_count synapse_http_server_response_db_txn_count:count
synapse_http_server_response_count synapse_http_server_response_db_txn_duration:count

synapse_http_server_response_time_seconds synapse_http_server_response_time:total
synapse_http_server_response_ru_utime_seconds synapse_http_server_response_ru_utime:total
synapse_http_server_response_ru_stime_seconds synapse_http_server_response_ru_stime:total
synapse_http_server_response_db_txn_count synapse_http_server_response_db_txn_count:total
synapse_http_server_response_db_txn_duration_seconds synapse_http_server_response_db_txn_duration:total
==================================================== ===================================================


Standard Metric Names
---------------------

Expand All @@ -42,7 +89,7 @@ have been changed to seconds, from miliseconds.

================================== =============================
New name Old name
---------------------------------- -----------------------------
================================== =============================
process_cpu_user_seconds_total process_resource_utime / 1000
process_cpu_system_seconds_total process_resource_stime / 1000
process_open_fds (no 'type' label) process_fds
Expand All @@ -52,8 +99,8 @@ The python-specific counts of garbage collector performance have been renamed.

=========================== ======================
New name Old name
--------------------------- ----------------------
python_gc_time reactor_gc_time
=========================== ======================
python_gc_time reactor_gc_time
python_gc_unreachable_total reactor_gc_unreachable
python_gc_counts reactor_gc_counts
=========================== ======================
Expand All @@ -62,7 +109,7 @@ The twisted-specific reactor metrics have been renamed.

==================================== =====================
New name Old name
------------------------------------ ---------------------
==================================== =====================
python_twisted_reactor_pending_calls reactor_pending_calls
python_twisted_reactor_tick_time reactor_tick_time
==================================== =====================
56 changes: 42 additions & 14 deletions synapse/http/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,34 +42,62 @@

metrics = synapse.metrics.get_metrics_for(__name__)

incoming_requests_counter = metrics.register_counter(
"requests",
# total number of responses served, split by method/servlet/tag
response_count = metrics.register_counter(
"response_count",
labels=["method", "servlet", "tag"],
alternative_names=(
# the following are all deprecated aliases for the same metric
metrics.name_prefix + x for x in (
"_requests",
"_response_time:count",
"_response_ru_utime:count",
"_response_ru_stime:count",
"_response_db_txn_count:count",
"_response_db_txn_duration:count",
)
)
)

outgoing_responses_counter = metrics.register_counter(
"responses",
labels=["method", "code"],
)

response_timer = metrics.register_distribution(
"response_time",
labels=["method", "servlet", "tag"]
response_timer = metrics.register_counter(
"response_time_seconds",
labels=["method", "servlet", "tag"],
alternative_names=(
metrics.name_prefix + "_response_time:total",
),
)

response_ru_utime = metrics.register_distribution(
"response_ru_utime", labels=["method", "servlet", "tag"]
response_ru_utime = metrics.register_counter(
"response_ru_utime_seconds", labels=["method", "servlet", "tag"],
alternative_names=(
metrics.name_prefix + "_response_ru_utime:total",
),
)

response_ru_stime = metrics.register_distribution(
"response_ru_stime", labels=["method", "servlet", "tag"]
response_ru_stime = metrics.register_counter(
"response_ru_stime_seconds", labels=["method", "servlet", "tag"],
alternative_names=(
metrics.name_prefix + "_response_ru_stime:total",
),
)

response_db_txn_count = metrics.register_distribution(
"response_db_txn_count", labels=["method", "servlet", "tag"]
response_db_txn_count = metrics.register_counter(
"response_db_txn_count", labels=["method", "servlet", "tag"],
alternative_names=(
metrics.name_prefix + "_response_db_txn_count:total",
),
)

response_db_txn_duration = metrics.register_distribution(
"response_db_txn_duration", labels=["method", "servlet", "tag"]
response_db_txn_duration = metrics.register_counter(
"response_db_txn_duration_seconds", labels=["method", "servlet", "tag"],
alternative_names=(
metrics.name_prefix + "_response_db_txn_duration:total",
),
)


Expand Down Expand Up @@ -330,7 +358,7 @@ def stop(self, clock, request):
)
return

incoming_requests_counter.inc(request.method, self.name, tag)
response_count.inc(request.method, self.name, tag)

response_timer.inc_by(
clock.time_msec() - self.start, request.method,
Expand Down
81 changes: 67 additions & 14 deletions synapse/metrics/metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,33 @@
from itertools import chain


# TODO(paul): I can't believe Python doesn't have one of these
def map_concat(func, items):
# flatten a list-of-lists
return list(chain.from_iterable(map(func, items)))
def flatten(items):
"""Flatten a list of lists
Args:
items: iterable[iterable[X]]
Returns:
list[X]: flattened list
"""
return list(chain.from_iterable(items))


class BaseMetric(object):
"""Base class for metrics which report a single value per label set
"""

def __init__(self, name, labels=[]):
self.name = name
def __init__(self, name, labels=[], alternative_names=[]):
"""
Args:
name (str): principal name for this metric
labels (list(str)): names of the labels which will be reported
for this metric
alternative_names (iterable(str)): list of alternative names for
this metric. This can be useful to provide a migration path
when renaming metrics.
"""
self._names = [name] + list(alternative_names)
self.labels = labels # OK not to clone as we never write it

def dimension(self):
Expand All @@ -36,7 +53,7 @@ def is_scalar(self):
return not len(self.labels)

def _render_labelvalue(self, value):
# TODO: some kind of value escape
# TODO: escape backslashes, quotes and newlines
return '"%s"' % (value)

def _render_key(self, values):
Expand All @@ -47,6 +64,36 @@ def _render_key(self, values):
for k, v in zip(self.labels, values)])
)

def _render_for_labels(self, label_values, value):
"""Render this metric for a single set of labels
Args:
label_values (list[str]): values for each of the labels
value: value of the metric at with these labels
Returns:
iterable[str]: rendered metric
"""
rendered_labels = self._render_key(label_values)
return (
"%s%s %.12g" % (name, rendered_labels, value)
for name in self._names
)

def render(self):
"""Render this metric
Each metric is rendered as:
name{label1="val1",label2="val2"} value
https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details
Returns:
iterable[str]: rendered metrics
"""
raise NotImplementedError()


class CounterMetric(BaseMetric):
"""The simplest kind of metric; one that stores a monotonically-increasing
Expand All @@ -62,6 +109,10 @@ class CounterMetric(BaseMetric):
def __init__(self, *args, **kwargs):
super(CounterMetric, self).__init__(*args, **kwargs)

# dict[list[str]]: value for each set of label values. the keys are the
# label values, in the same order as the labels in self.labels.
#
# (if the metric is a scalar, the (single) key is the empty list).
self.counts = {}

# Scalar metrics are never empty
Expand All @@ -84,11 +135,11 @@ def inc_by(self, incr, *values):
def inc(self, *values):
self.inc_by(1, *values)

def render_item(self, k):
return ["%s%s %.12g" % (self.name, self._render_key(k), self.counts[k])]

def render(self):
return map_concat(self.render_item, sorted(self.counts.keys()))
return flatten(
self._render_for_labels(k, self.counts[k])
for k in sorted(self.counts.keys())
)


class CallbackMetric(BaseMetric):
Expand All @@ -105,10 +156,12 @@ def render(self):
value = self.callback()

if self.is_scalar():
return ["%s %.12g" % (self.name, value)]
return list(self._render_for_labels([], value))

return ["%s%s %.12g" % (self.name, self._render_key(k), value[k])
for k in sorted(value.keys())]
return flatten(
self._render_for_labels(k, value[k])
for k in sorted(value.keys())
)


class DistributionMetric(object):
Expand Down
53 changes: 42 additions & 11 deletions synapse/util/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,25 +27,56 @@

metrics = synapse.metrics.get_metrics_for(__name__)

block_timer = metrics.register_distribution(
"block_timer",
labels=["block_name"]
# total number of times we have hit this block
response_count = metrics.register_counter(
"block_count",
labels=["block_name"],
alternative_names=(
# the following are all deprecated aliases for the same metric
metrics.name_prefix + x for x in (
"_block_timer:count",
"_block_ru_utime:count",
"_block_ru_stime:count",
"_block_db_txn_count:count",
"_block_db_txn_duration:count",
)
)
)

block_timer = metrics.register_counter(
"block_time_seconds",
labels=["block_name"],
alternative_names=(
metrics.name_prefix + "_block_timer:total",
),
)

block_ru_utime = metrics.register_distribution(
"block_ru_utime", labels=["block_name"]
block_ru_utime = metrics.register_counter(
"block_ru_utime_seconds", labels=["block_name"],
alternative_names=(
metrics.name_prefix + "_block_ru_utime:total",
),
)

block_ru_stime = metrics.register_distribution(
"block_ru_stime", labels=["block_name"]
block_ru_stime = metrics.register_counter(
"block_ru_stime_seconds", labels=["block_name"],
alternative_names=(
metrics.name_prefix + "_block_ru_stime:total",
),
)

block_db_txn_count = metrics.register_distribution(
"block_db_txn_count", labels=["block_name"]
block_db_txn_count = metrics.register_counter(
"block_db_txn_count", labels=["block_name"],
alternative_names=(
metrics.name_prefix + "_block_db_txn_count:total",
),
)

block_db_txn_duration = metrics.register_distribution(
"block_db_txn_duration", labels=["block_name"]
block_db_txn_duration = metrics.register_counter(
"block_db_txn_duration_seconds", labels=["block_name"],
alternative_names=(
metrics.name_prefix + "_block_db_txn_count:total",
),
)


Expand Down

0 comments on commit febdca4

Please sign in to comment.