-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CON-181 - joaquin - Add PrometheusClient to ContentNode #2888
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think i should be solo/primary reviewer here as i have zero context into prometheus. should be @dmanjunath
also would be good to clarify the purpose of these metrics vs eg our local redis metrics. are we planning on moving all of those to prometheus?
given this is a new pattern which no one else has context on, might be good to add a README in CN for our prometheus usage pattern
also the Discprov code changes should be separated into a separate PR, however minor. and someone with Discprov codebase familiarity should review those
@@ -58,16 +58,14 @@ def save_time(self, labels=None, start_time=None): | |||
self.save(self.elapsed(start_time), labels) | |||
|
|||
def save(self, value, labels=None): | |||
this_metric = self.metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this actually work? wouldn't the below assignments just modify the value of this_metric
as opposed to updating self.metric
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it does!
self.metric.labels()
returns a new Metric object which we will then use to call .set()
or value()
with.
We don't seek to change self.metric
since we might want to call self.metric.set()
later without including any labels, but sometimes we do want to call self.metric.labels(labels).set()
.
New PR here: #2901
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @SidSethi's comments and left a few comments of my own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside from several implementation related comments, i think it would be good to move the prometheus code into src/services/metrics/*
folder and add a README for common usage patterns bc i found it pretty difficult to follow what was happening
|
||
module.exports = function (app) { | ||
app.get( | ||
'/prometheus_metrics', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we document what this route is doing
iiuc:
- have prometheus populate all metrics into a "collector"
- pull info from the collectors via ".metrics()"
would be nice to rewrite this a bit as:
try {
// TODO document - what do these do
PrometheusMetric.populateCollectors()
const data = await Prometheus.register.metrics()
// TODO - what/why
res.set('Content-Type', Prometheus.register.contentType)
return successResponse(data)
} catch (e) {
console.error('Prometheus Metrics Exporter error:', e)
return errorResponseServerError(e.message)
}```
take a look at how other routes do response handling e.g. https://github.com/AudiusProject/audius-protocol/blob/jc-asi-983/creator-node/src/routes/tracks.js#L595
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth rate limiting or caching response from this route?
not sure how expensive the processing inside this function is
cc @dmanjunath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah probably not worth doing either tbh. i think prometheus also clears a bunch of metrics automatically once it's collected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SidSethi could you look at the new layout and confirm that is clearer?
I couldn't use successResponse
since that only returns json objects, not raw text.
re: rate limiting, that's something that we discussed looking into if it became a problem by using a "secret api key" specifically for the /prometheus_metrics
endpoint.
// set metric prefix of audius_project_ | ||
name = `audius_cn_${name}` | ||
|
||
this.metricType = metric_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead just define the constructor param as
constructor(name, description, labelNames, metric_type = PrometheusType.HISTOGRAM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually even better - lets not allow a default and force each caller to explicitly specify the metric type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the constructor to use named parameters, so I believe I've lost the ability to set default parameters, right?
For almost all cases, we'll want histograms since those "time things", but sometimes we'll want gauges to see what the current stat is.
this.metricType = PrometheusType.HISTOGRAM | ||
} | ||
|
||
// CollectorRegistries must be uniquely named |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats a CollectorRegistry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not allowed to call new PrometheusMetricClass({name})
twice with the same name.
Instead, we will use something like a Python defaultdict
to create and return new PrometheusMetricClass({name})
if we haven't seen it before, or the previously created instance if we have seen it before.
A good hierarchical overview is:
- Prometheus-Client Registries hold Collectors.
- Collectors hold registers of the metrics that they are holding.
- Exporters export all metrics held within registries for each collector held within the main registry.
- Prometheus-Client provides both Metric Registries as well as an Exporter for the Registries.
- Prometheus scrapes Exporters to import all metrics.
*/ | ||
saveTime(labels, startTime) { | ||
this.save(this.elapsed(startTime), labels) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this nested flow is confusing
saveTime calls never specify startTime, so we should define the constructor here as
saveTime(labels, startTime = Date.now())
instead of nesting that default in elapsed()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saveTime
is saving the time it took between when the constructor was called (first two lines of the constructor) and when saveTime()
is called.
However, you can choose to overload saveTime for different reasons while still relying on the same metric name.
What startTime can't be is Date.now()
since we'll want Date.now() - startTime
as seen in elapsed()
.
handleResponse(async (req, res) => { | ||
try { | ||
PrometheusMetric.populateCollectors() | ||
// res.set('Content-Type', Prometheus.register.contentType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this line?
const { getApp } = require('./lib/app') | ||
const { getLibsMock } = require('./lib/libsMock') | ||
|
||
describe('test Users', async function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test Prometheus?
|
||
module.exports = function (app) { | ||
app.get( | ||
'/prometheus_metrics', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah probably not worth doing either tbh. i think prometheus also clears a bunch of metrics automatically once it's collected
closing in favor of #3166 🙂 thx joaquin for all the help |
Description
Add PrometheusClient to ContentNode with a simple test metric to graph track.js:upload() times.
Tests
Tested by uploading a track to a local audius-client w/ a remote-dev backend. This populated some tracks sample metrics while also testing the importer.
How will this change be monitored? Are there sufficient logs?
Once deployed to staging, I'll upload a track and ensure metrics appear.
Once soaking in prod, I'll ensure metrics appear without issues.