Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Reorganise request and block metrics #2785

Merged
merged 4 commits into from
Jan 16, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@ Unreleased
==========

synctl no longer starts the main synapse when using ``-a`` option with workers.
A new worker file should be added with ``worker_app: synapse.app.homeserver``
A new worker file should be added with ``worker_app: synapse.app.homeserver``.

This release also begins the process of renaming a number of the metrics
reported to prometheus. See `docs/metrics-howto.rst <docs/metrics-howto.rst#block-and-response-metrics-renamed-for-0-27-0>`_.


Changes in synapse v0.26.0 (2018-01-05)
Expand Down
61 changes: 54 additions & 7 deletions docs/metrics-howto.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ How to monitor Synapse metrics using Prometheus
metrics_port: 9092

Also ensure that ``enable_metrics`` is set to ``True``.

Restart synapse.

3. Add a prometheus target for synapse.
Expand All @@ -28,11 +28,58 @@ How to monitor Synapse metrics using Prometheus
static_configs:
- targets: ["my.server.here:9092"]

If your prometheus is older than 1.5.2, you will need to replace
If your prometheus is older than 1.5.2, you will need to replace
``static_configs`` in the above with ``target_groups``.

Restart prometheus.


Block and response metrics renamed for 0.27.0
---------------------------------------------

Synapse 0.27.0 begins the process of rationalising the duplicate ``*:count``
metrics reported for the resource tracking for code blocks and HTTP requests.

At the same time, the corresponding ``*:total`` metrics are being renamed, as
the ``:total`` suffix no longer makes sense in the absence of a corresponding
``:count`` metric.

To enable a graceful migration path, this release just adds new names for the
metrics being renamed. A future release will remove the old ones.

The following table shows the new metrics, and the old metrics which they are
replacing.

==================================================== ===================================================
New name Old name
==================================================== ===================================================
synapse_util_metrics_block_count synapse_util_metrics_block_timer:count
synapse_util_metrics_block_count synapse_util_metrics_block_ru_utime:count
synapse_util_metrics_block_count synapse_util_metrics_block_ru_stime:count
synapse_util_metrics_block_count synapse_util_metrics_block_db_txn_count:count
synapse_util_metrics_block_count synapse_util_metrics_block_db_txn_duration:count

synapse_util_metrics_block_time_seconds synapse_util_metrics_block_timer:total
synapse_util_metrics_block_ru_utime_seconds synapse_util_metrics_block_ru_utime:total
synapse_util_metrics_block_ru_stime_seconds synapse_util_metrics_block_ru_stime:total
synapse_util_metrics_block_db_txn_count synapse_util_metrics_block_db_txn_count:total
synapse_util_metrics_block_db_txn_duration_seconds synapse_util_metrics_block_db_txn_duration:total

synapse_http_server_response_count synapse_http_server_requests
synapse_http_server_response_count synapse_http_server_response_time:count
synapse_http_server_response_count synapse_http_server_response_ru_utime:count
synapse_http_server_response_count synapse_http_server_response_ru_stime:count
synapse_http_server_response_count synapse_http_server_response_db_txn_count:count
synapse_http_server_response_count synapse_http_server_response_db_txn_duration:count

synapse_http_server_response_time_seconds synapse_http_server_response_time:total
synapse_http_server_response_ru_utime_seconds synapse_http_server_response_ru_utime:total
synapse_http_server_response_ru_stime_seconds synapse_http_server_response_ru_stime:total
synapse_http_server_response_db_txn_count synapse_http_server_response_db_txn_count:total
synapse_http_server_response_db_txn_duration_seconds synapse_http_server_response_db_txn_duration:total
==================================================== ===================================================


Standard Metric Names
---------------------

Expand All @@ -42,7 +89,7 @@ have been changed to seconds, from miliseconds.

================================== =============================
New name Old name
---------------------------------- -----------------------------
================================== =============================
process_cpu_user_seconds_total process_resource_utime / 1000
process_cpu_system_seconds_total process_resource_stime / 1000
process_open_fds (no 'type' label) process_fds
Expand All @@ -52,8 +99,8 @@ The python-specific counts of garbage collector performance have been renamed.

=========================== ======================
New name Old name
--------------------------- ----------------------
python_gc_time reactor_gc_time
=========================== ======================
python_gc_time reactor_gc_time
python_gc_unreachable_total reactor_gc_unreachable
python_gc_counts reactor_gc_counts
=========================== ======================
Expand All @@ -62,7 +109,7 @@ The twisted-specific reactor metrics have been renamed.

==================================== =====================
New name Old name
------------------------------------ ---------------------
==================================== =====================
python_twisted_reactor_pending_calls reactor_pending_calls
python_twisted_reactor_tick_time reactor_tick_time
==================================== =====================
56 changes: 42 additions & 14 deletions synapse/http/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,34 +42,62 @@

metrics = synapse.metrics.get_metrics_for(__name__)

incoming_requests_counter = metrics.register_counter(
"requests",
# total number of responses served, split by method/servlet/tag
response_count = metrics.register_counter(
"response_count",
labels=["method", "servlet", "tag"],
alternative_names=(
# the following are all deprecated aliases for the same metric
metrics.name_prefix + x for x in (
"_requests",
"_response_time:count",
"_response_ru_utime:count",
"_response_ru_stime:count",
"_response_db_txn_count:count",
"_response_db_txn_duration:count",
)
)
)

outgoing_responses_counter = metrics.register_counter(
"responses",
labels=["method", "code"],
)

response_timer = metrics.register_distribution(
"response_time",
labels=["method", "servlet", "tag"]
response_timer = metrics.register_counter(
"response_time_seconds",
labels=["method", "servlet", "tag"],
alternative_names=(
metrics.name_prefix + "_response_time:total",
),
)

response_ru_utime = metrics.register_distribution(
"response_ru_utime", labels=["method", "servlet", "tag"]
response_ru_utime = metrics.register_counter(
"response_ru_utime_seconds", labels=["method", "servlet", "tag"],
alternative_names=(
metrics.name_prefix + "_response_ru_utime:total",
),
)

response_ru_stime = metrics.register_distribution(
"response_ru_stime", labels=["method", "servlet", "tag"]
response_ru_stime = metrics.register_counter(
"response_ru_stime_seconds", labels=["method", "servlet", "tag"],
alternative_names=(
metrics.name_prefix + "_response_ru_stime:total",
),
)

response_db_txn_count = metrics.register_distribution(
"response_db_txn_count", labels=["method", "servlet", "tag"]
response_db_txn_count = metrics.register_counter(
"response_db_txn_count", labels=["method", "servlet", "tag"],
alternative_names=(
metrics.name_prefix + "_response_db_txn_count:total",
),
)

response_db_txn_duration = metrics.register_distribution(
"response_db_txn_duration", labels=["method", "servlet", "tag"]
response_db_txn_duration = metrics.register_counter(
"response_db_txn_duration_seconds", labels=["method", "servlet", "tag"],
alternative_names=(
metrics.name_prefix + "_response_db_txn_duration:total",
),
)


Expand Down Expand Up @@ -330,7 +358,7 @@ def stop(self, clock, request):
)
return

incoming_requests_counter.inc(request.method, self.name, tag)
response_count.inc(request.method, self.name, tag)

response_timer.inc_by(
clock.time_msec() - self.start, request.method,
Expand Down
81 changes: 67 additions & 14 deletions synapse/metrics/metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,33 @@
from itertools import chain


# TODO(paul): I can't believe Python doesn't have one of these
def map_concat(func, items):
# flatten a list-of-lists
return list(chain.from_iterable(map(func, items)))
def flatten(items):
"""Flatten a list of lists

Args:
items: iterable[iterable[X]]

Returns:
list[X]: flattened list
"""
return list(chain.from_iterable(items))


class BaseMetric(object):
"""Base class for metrics which report a single value per label set
"""

def __init__(self, name, labels=[]):
self.name = name
def __init__(self, name, labels=[], alternative_names=[]):
"""
Args:
name (str): principal name for this metric
labels (list(str)): names of the labels which will be reported
for this metric
alternative_names (iterable(str)): list of alternative names for
this metric. This can be useful to provide a migration path
when renaming metrics.
"""
self._names = [name] + list(alternative_names)
self.labels = labels # OK not to clone as we never write it

def dimension(self):
Expand All @@ -36,7 +53,7 @@ def is_scalar(self):
return not len(self.labels)

def _render_labelvalue(self, value):
# TODO: some kind of value escape
# TODO: escape backslashes, quotes and newlines
return '"%s"' % (value)

def _render_key(self, values):
Expand All @@ -47,6 +64,36 @@ def _render_key(self, values):
for k, v in zip(self.labels, values)])
)

def _render_for_labels(self, label_values, value):
"""Render this metric for a single set of labels

Args:
label_values (list[str]): values for each of the labels
value: value of the metric at with these labels

Returns:
iterable[str]: rendered metric
"""
rendered_labels = self._render_key(label_values)
return (
"%s%s %.12g" % (name, rendered_labels, value)
for name in self._names
)

def render(self):
"""Render this metric

Each metric is rendered as:

name{label1="val1",label2="val2"} value

https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details

Returns:
iterable[str]: rendered metrics
"""
raise NotImplementedError()


class CounterMetric(BaseMetric):
"""The simplest kind of metric; one that stores a monotonically-increasing
Expand All @@ -62,6 +109,10 @@ class CounterMetric(BaseMetric):
def __init__(self, *args, **kwargs):
super(CounterMetric, self).__init__(*args, **kwargs)

# dict[list[str]]: value for each set of label values. the keys are the
# label values, in the same order as the labels in self.labels.
#
# (if the metric is a scalar, the (single) key is the empty list).
self.counts = {}

# Scalar metrics are never empty
Expand All @@ -84,11 +135,11 @@ def inc_by(self, incr, *values):
def inc(self, *values):
self.inc_by(1, *values)

def render_item(self, k):
return ["%s%s %.12g" % (self.name, self._render_key(k), self.counts[k])]

def render(self):
return map_concat(self.render_item, sorted(self.counts.keys()))
return flatten(
self._render_for_labels(k, self.counts[k])
for k in sorted(self.counts.keys())
)


class CallbackMetric(BaseMetric):
Expand All @@ -105,10 +156,12 @@ def render(self):
value = self.callback()

if self.is_scalar():
return ["%s %.12g" % (self.name, value)]
return list(self._render_for_labels([], value))

return ["%s%s %.12g" % (self.name, self._render_key(k), value[k])
for k in sorted(value.keys())]
return flatten(
self._render_for_labels(k, value[k])
for k in sorted(value.keys())
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is k here really going to be a list? Why do we need to flatten this when we didn't previously?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm an idiot, nevermind

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)



class DistributionMetric(object):
Expand Down
53 changes: 42 additions & 11 deletions synapse/util/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,25 +27,56 @@

metrics = synapse.metrics.get_metrics_for(__name__)

block_timer = metrics.register_distribution(
"block_timer",
labels=["block_name"]
# total number of times we have hit this block
response_count = metrics.register_counter(
"block_count",
labels=["block_name"],
alternative_names=(
# the following are all deprecated aliases for the same metric
metrics.name_prefix + x for x in (
"_block_timer:count",
"_block_ru_utime:count",
"_block_ru_stime:count",
"_block_db_txn_count:count",
"_block_db_txn_duration:count",
)
)
)

block_timer = metrics.register_counter(
"block_time_seconds",
labels=["block_name"],
alternative_names=(
metrics.name_prefix + "_block_timer:total",
),
)

block_ru_utime = metrics.register_distribution(
"block_ru_utime", labels=["block_name"]
block_ru_utime = metrics.register_counter(
"block_ru_utime_seconds", labels=["block_name"],
alternative_names=(
metrics.name_prefix + "_block_ru_utime:total",
),
)

block_ru_stime = metrics.register_distribution(
"block_ru_stime", labels=["block_name"]
block_ru_stime = metrics.register_counter(
"block_ru_stime_seconds", labels=["block_name"],
alternative_names=(
metrics.name_prefix + "_block_ru_stime:total",
),
)

block_db_txn_count = metrics.register_distribution(
"block_db_txn_count", labels=["block_name"]
block_db_txn_count = metrics.register_counter(
"block_db_txn_count", labels=["block_name"],
alternative_names=(
metrics.name_prefix + "_block_db_txn_count:total",
),
)

block_db_txn_duration = metrics.register_distribution(
"block_db_txn_duration", labels=["block_name"]
block_db_txn_duration = metrics.register_counter(
"block_db_txn_duration_seconds", labels=["block_name"],
alternative_names=(
metrics.name_prefix + "_block_db_txn_count:total",
),
)


Expand Down