Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a /metrics endpoint for Prometheus Metrics #3490

Merged
merged 5 commits into from
Jun 13, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions notebook/base/handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
from jinja2 import TemplateNotFound
from tornado import web, gen, escape, httputil
from tornado.log import app_log
import prometheus_client

from notebook._sysinfo import get_sys_info

Expand Down Expand Up @@ -809,6 +810,16 @@ def get(self):
url = sep.join([self._url, self.request.query])
self.redirect(url, permanent=self._permanent)

class PrometheusMetricsHandler(IPythonHandler):
"""
Return prometheus metrics for this notebook server
"""
@web.authenticated
def get(self):
self.set_header('Content-Type', prometheus_client.CONTENT_TYPE_LATEST)
self.write(prometheus_client.generate_latest(prometheus_client.REGISTRY))


#-----------------------------------------------------------------------------
# URL pattern fragments for re-use
#-----------------------------------------------------------------------------
Expand All @@ -825,4 +836,5 @@ def get(self):
(r".*/", TrailingSlashHandler),
(r"api", APIVersionHandler),
(r'/(robots\.txt|favicon\.ico)', web.StaticFileHandler),
(r'/metrics', PrometheusMetricsHandler)
]
3 changes: 2 additions & 1 deletion notebook/log.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

import json
from tornado.log import access_log
from .metrics import prometheus_log_method

def log_request(handler):
"""log a bit more information about each request than tornado's default
Expand Down Expand Up @@ -45,4 +46,4 @@ def log_request(handler):
# log all headers if it caused an error
log_method(json.dumps(dict(request.headers), indent=2))
log_method(msg.format(**ns))

prometheus_log_method(handler)
39 changes: 39 additions & 0 deletions notebook/metrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""
Prometheus metrics exported by Jupyter Notebook Server

Read https://prometheus.io/docs/practices/naming/ for naming
conventions for metrics & labels. We generally prefer naming them
`<noun>_<verb>_<type_suffix>`. So a histogram that's tracking
the duration (in seconds) of servers spawning would be called
SERVER_SPAWN_DURATION_SECONDS.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy/paste. REQUEST_DURATION_SECONDS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an FYI, this particular example breaks the naming rule in the docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps better to remove the preference sentence with noun/verb/type.

Consider renaming to NOTEBOOK_REQUEST_DURATION_SECONDS based on Prometheus docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's not clear to me what a 'request duration' is - is that the time from the request being sent to it being received? The time from receiving the first byte to receiving the last? The time from receiving the request to sending the response?

If this is a standard term in web metrics, it doesn't matter that it's not familiar to me. But if it's a term we're creating, maybe we can create something less ambiguous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @yuvipanda

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya!

I removed the naming convention recommendation, and just directly linked only to the page instead. This should hopefully reduce confusion.

I've also renamed this metric to http_request_duration_seconds. I think that is pretty standard for what we are doing here, which is indiscriminately recording metric info for all http requests. Operators usually use job and instance labels automatically added by prometheus to differentiate various applications & instances of applications. So I think in this case, it's ok to not use a prefix.

"""

from prometheus_client import Histogram

REQUEST_DURATION_SECONDS = Histogram(
'request_duration_seconds',
'request duration for all HTTP requests',
['method', 'handler', 'code'],
)

def prometheus_log_method(handler):
"""
Tornado log handler for recording RED metrics.

We record the following metrics:
Rate - the number of requests, per second, your services are serving.
Errors - the number of failed requests per second.
Duration - The amount of time each request takes expressed as a time interval.

We use a fully qualified name of the handler as a label,
rather than every url path to reduce cardinality.

This function should be either the value of or called from a function
that is the 'log_function' tornado setting. This makes it get called
at the end of every request, allowing us to record the metrics we need.
"""
REQUEST_DURATION_SECONDS.labels(
method=handler.request.method,
handler='{}.{}'.format(handler.__class__.__module__, type(handler).__name__),
code=handler.get_status()
).observe(handler.request.request_time())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is low overhead, since it's being called on every request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, quite. It's just incrementing a local counter based on a few strings and a number:

In [12]: %timeit prometheus_log_method(handler)
5.88 µs ± 87.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The only network activity occurs when a prometheus server retrieves the metrics via the /metrics handler.

3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,8 @@
'nbconvert',
'ipykernel', # bless IPython kernel for now
'Send2Trash',
'terminado>=0.8.1'
'terminado>=0.8.1',
'prometheus_client'
],
extras_require = {
'test:python_version == "2.7"': ['mock'],
Expand Down