-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a /metrics endpoint for Prometheus Metrics #3490
Changes from 3 commits
a764f90
8aa22d6
7bad762
de13203
7a3c0f3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
""" | ||
Prometheus metrics exported by Jupyter Notebook Server | ||
|
||
Read https://prometheus.io/docs/practices/naming/ for naming | ||
conventions for metrics & labels. We generally prefer naming them | ||
`<noun>_<verb>_<type_suffix>`. So a histogram that's tracking | ||
the duration (in seconds) of servers spawning would be called | ||
SERVER_SPAWN_DURATION_SECONDS. | ||
""" | ||
|
||
from prometheus_client import Histogram | ||
|
||
REQUEST_DURATION_SECONDS = Histogram( | ||
'request_duration_seconds', | ||
'request duration for all HTTP requests', | ||
['method', 'handler', 'code'], | ||
) | ||
|
||
def prometheus_log_method(handler): | ||
""" | ||
Tornado log handler for recording RED metrics. | ||
|
||
We record the following metrics: | ||
Rate - the number of requests, per second, your services are serving. | ||
Errors - the number of failed requests per second. | ||
Duration - The amount of time each request takes expressed as a time interval. | ||
|
||
We use a fully qualified name of the handler as a label, | ||
rather than every url path to reduce cardinality. | ||
|
||
This function should be either the value of or called from a function | ||
that is the 'log_function' tornado setting. This makes it get called | ||
at the end of every request, allowing us to record the metrics we need. | ||
""" | ||
REQUEST_DURATION_SECONDS.labels( | ||
method=handler.request.method, | ||
handler='{}.{}'.format(handler.__class__.__module__, type(handler).__name__), | ||
code=handler.get_status() | ||
).observe(handler.request.request_time()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume this is low overhead, since it's being called on every request? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, quite. It's just incrementing a local counter based on a few strings and a number: In [12]: %timeit prometheus_log_method(handler)
5.88 µs ± 87.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) The only network activity occurs when a prometheus server retrieves the metrics via the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy/paste. REQUEST_DURATION_SECONDS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an FYI, this particular example breaks the naming rule in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps better to remove the preference sentence with noun/verb/type.
Consider renaming to
NOTEBOOK_REQUEST_DURATION_SECONDS
based on Prometheus docs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's not clear to me what a 'request duration' is - is that the time from the request being sent to it being received? The time from receiving the first byte to receiving the last? The time from receiving the request to sending the response?
If this is a standard term in web metrics, it doesn't matter that it's not familiar to me. But if it's a term we're creating, maybe we can create something less ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @yuvipanda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya!
I removed the naming convention recommendation, and just directly linked only to the page instead. This should hopefully reduce confusion.
I've also renamed this metric to http_request_duration_seconds. I think that is pretty standard for what we are doing here, which is indiscriminately recording metric info for all http requests. Operators usually use job and instance labels automatically added by prometheus to differentiate various applications & instances of applications. So I think in this case, it's ok to not use a prefix.