Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Run Prometheus on a different port, optionally. #3274

Merged
merged 15 commits into from
May 31, 2018
Merged
77 changes: 66 additions & 11 deletions docs/metrics-howto.rst
Original file line number Diff line number Diff line change
@@ -1,25 +1,47 @@
How to monitor Synapse metrics using Prometheus
===============================================

1. Install prometheus:
1. Install Prometheus:

Follow instructions at http://prometheus.io/docs/introduction/install/

2. Enable synapse metrics:
2. Enable Synapse metrics:

Simply setting a (local) port number will enable it. Pick a port.
prometheus itself defaults to 9090, so starting just above that for
locally monitored services seems reasonable. E.g. 9092:
There are two methods of enabling metrics in Synapse.

Add to homeserver.yaml::
The first serves the metrics as a part of the usual web server and can be
enabled by adding the "metrics" resource to the existing listener as such::

metrics_port: 9092
resources:
- names:
- client
- metrics

Also ensure that ``enable_metrics`` is set to ``True``.
This provides a simple way of adding metrics to your Synapse installation,
and serves under ``/_synapse/metrics``. If you do not wish your metrics be
publicly exposed, you will need to either filter it out at your load
balancer, or use the second method.

Restart synapse.
The second method runs the metrics server on a different port, in a
different thread to Synapse. This can make it more resilient to heavy load
meaning metrics cannot be retrieved, and can be exposed to just internal
networks easier. The served metrics are available over HTTP only, and will
be available at ``/``.

3. Add a prometheus target for synapse.
Add a new listener to homeserver.yaml::

listeners:
- type: metrics
port: 9000
bind_addresses:
- '0.0.0.0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this just IPv4 rather than an IPv6 address too? (I see f551ac3 changed this, but the commit comment is, ahem, unenlightening.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SocketServer (from the stdlib that Prometheus Client uses) only supports IPv4 out of the box, so it refuses IPv6 addresses. :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whaaa


For both options, you will need to ensure that ``enable_metrics`` is set to
``True``.

Restart Synapse.

3. Add a Prometheus target for Synapse.

It needs to set the ``metrics_path`` to a non-default value (under ``scrape_configs``)::

Expand All @@ -31,7 +53,40 @@ How to monitor Synapse metrics using Prometheus
If your prometheus is older than 1.5.2, you will need to replace
``static_configs`` in the above with ``target_groups``.

Restart prometheus.
Restart Prometheus.


Removal of deprecated metrics & time based counters becoming histograms in 0.31.0
---------------------------------------------------------------------------------

The duplicated metrics deprecated in Synapse 0.27.0 have been removed.

All time duration-based metrics have been changed to be seconds. This affects:

================================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msec -> sec metrics
================================
python_gc_time
python_twisted_reactor_tick_time
synapse_storage_query_time
synapse_storage_schedule_time
synapse_storage_transaction_time
================================

Several metrics have been changed to be histograms, which sort entries into
buckets and allow better analysis. The following metrics are now histograms:

=========================================
Altered metrics
=========================================
python_gc_time
python_twisted_reactor_pending_calls
python_twisted_reactor_tick_time
synapse_http_server_response_time_seconds
synapse_storage_query_time
synapse_storage_schedule_time
synapse_storage_transaction_time
=========================================


Block and response metrics renamed for 0.27.0
Expand Down
13 changes: 13 additions & 0 deletions synapse/app/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,19 @@ def quit_with_error(error_string):
sys.exit(1)


def listen_metrics(bind_addresses, port):
"""
Start Prometheus metrics server.
"""
from synapse.metrics import RegistryProxy
from prometheus_client import start_http_server

for host in bind_addresses:
reactor.callInThread(start_http_server, int(port),
addr=host, registry=RegistryProxy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_http_server should spawn its own thread, so is it necessary to callInThread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had issues with it jamming Twisted, for some reason, and this made it work fine. the mysteries of threading

logger.info("Metrics now reporting on %s:%d", host, port)


def listen_tcp(bind_addresses, port, factory, backlog=50):
"""
Create a TCP socket for a port and several addresses
Expand Down
7 changes: 7 additions & 0 deletions synapse/app/appservice.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,13 @@ def start_listening(self, listeners):
globals={"hs": self},
)
)
elif listener["type"] == "metrics":
if not self.get_config().enable_metrics:
logger.warn(("Metrics listener configured, but "
"collect_metrics is not enabled!"))
else:
_base.listen_metrics(listener["bind_addresses"],
listener["port"])
else:
logger.warn("Unrecognized listener type: %s", listener["type"])

Expand Down
11 changes: 9 additions & 2 deletions synapse/app/client_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from synapse.crypto import context_factory
from synapse.http.server import JsonResource
from synapse.http.site import SynapseSite
from synapse.metrics import RegistryProxy
from synapse.metrics.resource import METRICS_PREFIX, MetricsResource
from synapse.replication.slave.storage._base import BaseSlavedStore
from synapse.replication.slave.storage.appservice import SlavedApplicationServiceStore
Expand Down Expand Up @@ -77,7 +78,7 @@ def _listen_http(self, listener_config):
for res in listener_config["resources"]:
for name in res["names"]:
if name == "metrics":
resources[METRICS_PREFIX] = MetricsResource(self)
resources[METRICS_PREFIX] = MetricsResource(RegistryProxy)
elif name == "client":
resource = JsonResource(self, canonical_json=False)
PublicRoomListRestServlet(self).register(resource)
Expand Down Expand Up @@ -118,7 +119,13 @@ def start_listening(self, listeners):
globals={"hs": self},
)
)

elif listener["type"] == "metrics":
if not self.get_config().enable_metrics:
logger.warn(("Metrics listener configured, but "
"collect_metrics is not enabled!"))
else:
_base.listen_metrics(listener["bind_addresses"],
listener["port"])
else:
logger.warn("Unrecognized listener type: %s", listener["type"])

Expand Down
10 changes: 9 additions & 1 deletion synapse/app/event_creator.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from synapse.crypto import context_factory
from synapse.http.server import JsonResource
from synapse.http.site import SynapseSite
from synapse.metrics import RegistryProxy
from synapse.metrics.resource import METRICS_PREFIX, MetricsResource
from synapse.replication.slave.storage._base import BaseSlavedStore
from synapse.replication.slave.storage.account_data import SlavedAccountDataStore
Expand Down Expand Up @@ -90,7 +91,7 @@ def _listen_http(self, listener_config):
for res in listener_config["resources"]:
for name in res["names"]:
if name == "metrics":
resources[METRICS_PREFIX] = MetricsResource(self)
resources[METRICS_PREFIX] = MetricsResource(RegistryProxy)
elif name == "client":
resource = JsonResource(self, canonical_json=False)
RoomSendEventRestServlet(self).register(resource)
Expand Down Expand Up @@ -134,6 +135,13 @@ def start_listening(self, listeners):
globals={"hs": self},
)
)
elif listener["type"] == "metrics":
if not self.get_config().enable_metrics:
logger.warn(("Metrics listener configured, but "
"collect_metrics is not enabled!"))
else:
_base.listen_metrics(listener["bind_addresses"],
listener["port"])
else:
logger.warn("Unrecognized listener type: %s", listener["type"])

Expand Down
10 changes: 9 additions & 1 deletion synapse/app/federation_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
from synapse.crypto import context_factory
from synapse.federation.transport.server import TransportLayerServer
from synapse.http.site import SynapseSite
from synapse.metrics import RegistryProxy
from synapse.metrics.resource import METRICS_PREFIX, MetricsResource
from synapse.replication.slave.storage._base import BaseSlavedStore
from synapse.replication.slave.storage.directory import DirectoryStore
Expand Down Expand Up @@ -71,7 +72,7 @@ def _listen_http(self, listener_config):
for res in listener_config["resources"]:
for name in res["names"]:
if name == "metrics":
resources[METRICS_PREFIX] = MetricsResource(self)
resources[METRICS_PREFIX] = MetricsResource(RegistryProxy)
elif name == "federation":
resources.update({
FEDERATION_PREFIX: TransportLayerServer(self),
Expand Down Expand Up @@ -107,6 +108,13 @@ def start_listening(self, listeners):
globals={"hs": self},
)
)
elif listener["type"] == "metrics":
if not self.get_config().enable_metrics:
logger.warn(("Metrics listener configured, but "
"collect_metrics is not enabled!"))
else:
_base.listen_metrics(listener["bind_addresses"],
listener["port"])
else:
logger.warn("Unrecognized listener type: %s", listener["type"])

Expand Down
10 changes: 9 additions & 1 deletion synapse/app/federation_sender.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from synapse.crypto import context_factory
from synapse.federation import send_queue
from synapse.http.site import SynapseSite
from synapse.metrics import RegistryProxy
from synapse.metrics.resource import METRICS_PREFIX, MetricsResource
from synapse.replication.slave.storage.deviceinbox import SlavedDeviceInboxStore
from synapse.replication.slave.storage.devices import SlavedDeviceStore
Expand Down Expand Up @@ -89,7 +90,7 @@ def _listen_http(self, listener_config):
for res in listener_config["resources"]:
for name in res["names"]:
if name == "metrics":
resources[METRICS_PREFIX] = MetricsResource(self)
resources[METRICS_PREFIX] = MetricsResource(RegistryProxy)

root_resource = create_resource_tree(resources, NoResource())

Expand Down Expand Up @@ -121,6 +122,13 @@ def start_listening(self, listeners):
globals={"hs": self},
)
)
elif listener["type"] == "metrics":
if not self.get_config().enable_metrics:
logger.warn(("Metrics listener configured, but "
"collect_metrics is not enabled!"))
else:
_base.listen_metrics(listener["bind_addresses"],
listener["port"])
else:
logger.warn("Unrecognized listener type: %s", listener["type"])

Expand Down
10 changes: 9 additions & 1 deletion synapse/app/frontend_proxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
RestServlet, parse_json_object_from_request,
)
from synapse.http.site import SynapseSite
from synapse.metrics import RegistryProxy
from synapse.metrics.resource import METRICS_PREFIX, MetricsResource
from synapse.replication.slave.storage._base import BaseSlavedStore
from synapse.replication.slave.storage.appservice import SlavedApplicationServiceStore
Expand Down Expand Up @@ -131,7 +132,7 @@ def _listen_http(self, listener_config):
for res in listener_config["resources"]:
for name in res["names"]:
if name == "metrics":
resources[METRICS_PREFIX] = MetricsResource(self)
resources[METRICS_PREFIX] = MetricsResource(RegistryProxy)
elif name == "client":
resource = JsonResource(self, canonical_json=False)
KeyUploadServlet(self).register(resource)
Expand Down Expand Up @@ -172,6 +173,13 @@ def start_listening(self, listeners):
globals={"hs": self},
)
)
elif listener["type"] == "metrics":
if not self.get_config().enable_metrics:
logger.warn(("Metrics listener configured, but "
"collect_metrics is not enabled!"))
else:
_base.listen_metrics(listener["bind_addresses"],
listener["port"])
else:
logger.warn("Unrecognized listener type: %s", listener["type"])

Expand Down
13 changes: 9 additions & 4 deletions synapse/app/homeserver.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
from synapse.http.server import RootRedirect
from synapse.http.site import SynapseSite
from synapse.metrics import RegistryProxy
from synapse.metrics.resource import METRICS_PREFIX
from synapse.metrics.resource import METRICS_PREFIX, MetricsResource
from synapse.python_dependencies import CONDITIONAL_REQUIREMENTS, \
check_requirements
from synapse.replication.http import ReplicationRestResource, REPLICATION_PREFIX
Expand All @@ -61,8 +61,6 @@
from twisted.web.server import GzipEncoderFactory
from twisted.web.static import File

from prometheus_client.twisted import MetricsResource

logger = logging.getLogger("synapse.app.homeserver")


Expand Down Expand Up @@ -232,7 +230,7 @@ def _configure_named_resource(self, name, compress=False):
resources[WEB_CLIENT_PREFIX] = build_resource_for_web_client(self)

if name == "metrics" and self.get_config().enable_metrics:
resources[METRICS_PREFIX] = MetricsResource(RegistryProxy())
resources[METRICS_PREFIX] = MetricsResource(RegistryProxy)

if name == "replication":
resources[REPLICATION_PREFIX] = ReplicationRestResource(self)
Expand Down Expand Up @@ -265,6 +263,13 @@ def start_listening(self):
reactor.addSystemEventTrigger(
"before", "shutdown", server_listener.stopListening,
)
elif listener["type"] == "metrics":
if not self.get_config().enable_metrics:
logger.warn(("Metrics listener configured, but "
"collect_metrics is not enabled!"))
else:
_base.listen_metrics(listener["bind_addresses"],
listener["port"])
else:
logger.warn("Unrecognized listener type: %s", listener["type"])

Expand Down
10 changes: 9 additions & 1 deletion synapse/app/media_repository.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
from synapse.config.logger import setup_logging
from synapse.crypto import context_factory
from synapse.http.site import SynapseSite
from synapse.metrics import RegistryProxy
from synapse.metrics.resource import METRICS_PREFIX, MetricsResource
from synapse.replication.slave.storage._base import BaseSlavedStore
from synapse.replication.slave.storage.appservice import SlavedApplicationServiceStore
Expand Down Expand Up @@ -73,7 +74,7 @@ def _listen_http(self, listener_config):
for res in listener_config["resources"]:
for name in res["names"]:
if name == "metrics":
resources[METRICS_PREFIX] = MetricsResource(self)
resources[METRICS_PREFIX] = MetricsResource(RegistryProxy)
elif name == "media":
media_repo = self.get_media_repository_resource()
resources.update({
Expand Down Expand Up @@ -114,6 +115,13 @@ def start_listening(self, listeners):
globals={"hs": self},
)
)
elif listener["type"] == "metrics":
if not self.get_config().enable_metrics:
logger.warn(("Metrics listener configured, but "
"collect_metrics is not enabled!"))
else:
_base.listen_metrics(listener["bind_addresses"],
listener["port"])
else:
logger.warn("Unrecognized listener type: %s", listener["type"])

Expand Down
10 changes: 9 additions & 1 deletion synapse/app/pusher.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from synapse.config.homeserver import HomeServerConfig
from synapse.config.logger import setup_logging
from synapse.http.site import SynapseSite
from synapse.metrics import RegistryProxy
from synapse.metrics.resource import METRICS_PREFIX, MetricsResource
from synapse.replication.slave.storage.account_data import SlavedAccountDataStore
from synapse.replication.slave.storage.events import SlavedEventStore
Expand Down Expand Up @@ -92,7 +93,7 @@ def _listen_http(self, listener_config):
for res in listener_config["resources"]:
for name in res["names"]:
if name == "metrics":
resources[METRICS_PREFIX] = MetricsResource(self)
resources[METRICS_PREFIX] = MetricsResource(RegistryProxy)

root_resource = create_resource_tree(resources, NoResource())

Expand Down Expand Up @@ -124,6 +125,13 @@ def start_listening(self, listeners):
globals={"hs": self},
)
)
elif listener["type"] == "metrics":
if not self.get_config().enable_metrics:
logger.warn(("Metrics listener configured, but "
"collect_metrics is not enabled!"))
else:
_base.listen_metrics(listener["bind_addresses"],
listener["port"])
else:
logger.warn("Unrecognized listener type: %s", listener["type"])

Expand Down
Loading